From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bart Van Assche Subject: Re: [PATCH RFC] spinlock: split out debugging check from spin_lock_mutex Date: Mon, 29 Apr 2013 21:12:03 +0200 Message-ID: <517EC603.6010506@acm.org> References: <20130423132345.GA5564@hmsreliant.think-freely.org> <20130423134443.GB5564@hmsreliant.think-freely.org> <20130423.133315.1278191592594257624.davem@davemloft.net> <20130423175016.GC5564@hmsreliant.think-freely.org> <517C1EC2.5090900@bartvanassche.be> <20130429181322.GA27119@hmsreliant.think-freely.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: David Miller , netdev@vger.kernel.org, mingo@redhat.com To: Neil Horman Return-path: Received: from gerard.telenet-ops.be ([195.130.132.48]:52504 "EHLO gerard.telenet-ops.be" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756877Ab3D2TMH (ORCPT ); Mon, 29 Apr 2013 15:12:07 -0400 In-Reply-To: <20130429181322.GA27119@hmsreliant.think-freely.org> Sender: netdev-owner@vger.kernel.org List-ID: On 04/29/13 20:13, Neil Horman wrote: > On Sat, Apr 27, 2013 at 08:53:54PM +0200, Bart Van Assche wrote: >> Sorry Neil, but I can still trigger the CPU stuck messages with kernel v3.9-rc8-24-gd7d7271: >> >> kernel: BUG: soft lockup - CPU#0 stuck for 22s! [rs:main Q:Reg:601] >> kernel: irq event stamp: 1999192 >> kernel: hardirqs last enabled at (1999191): [] console_unlock+0x41d/0x450 >> kernel: hardirqs last disabled at (1999192): [] apic_timer_interrupt+0x6a/0x80 >> kernel: softirqs last enabled at (1999188): [] __do_softirq+0x196/0x280 >> kernel: softirqs last disabled at (1999181): [] irq_exit+0xb5/0xc0 >> kernel: CPU 0 >> kernel: Pid: 601, comm: rs:main Q:Reg Tainted: G O 3.9.0-rc8-debug+ #1 System manufacturer P5Q DELUXE/P5Q DELUXE >> kernel: RIP: 0010:[] [] console_unlock+0x420/0x450 >> kernel: Call Trace: >> kernel: [] do_con_write.part.19+0x887/0x2040 >> kernel: [] ? process_output+0x37/0x70 >> kernel: [] ? mutex_lock_nested+0x28c/0x350 >> kernel: [] ? process_output+0x37/0x70 >> kernel: [] con_write+0x34/0x50 >> kernel: [] do_output_char+0x179/0x210 >> kernel: [] process_output+0x4d/0x70 >> kernel: [] n_tty_write+0x210/0x480 >> kernel: [] ? try_to_wake_up+0x2e0/0x2e0 >> kernel: [] tty_write+0x159/0x300 >> kernel: [] ? lock_release_holdtime.part.22+0xf/0x180 >> kernel: [] ? n_tty_poll+0x1c0/0x1c0 >> kernel: [] vfs_write+0xab/0x170 >> kernel: [] sys_write+0x55/0xa0 >> kernel: [] system_call_fastpath+0x16/0x1b > > So, looking at this, I'm not 100% sure whats going on. console_unlock has the > ability to detect the need for rescheduling when we call > raw_spin_unlock_irqrestore at the bottom of the function, so there should be no > reason that we don't give the per-cpu watchdog task a chance to run. About the > only case I see in which we might be in trouble would be if somehow got into an > infinite loop if every message in the log buffer had LOG_NOCONS set, in which > case we would spin in this loop without ever scheduling or re-enabling > interrupts. But for that to happen I think you would have to be flooding the > log with messages that were bigger than the default message buffer size (about > 1000 bytes without a newline). You're not doing that are you? The above call trace was triggered by sending messages at a high rate over netconsole where each message has a length between 40 and (about) 160 characters. I'm not sure though what exactly is causing this call trace - I haven't had the time yet to analyze this in detail. Bart.