From mboxrd@z Thu Jan 1 00:00:00 1970 From: helpmelearn Subject: Excessive logging causing rcu_sched_state stall (wait_for_xmitr 8250.c) Date: Tue, 15 May 2012 17:02:36 -0700 (PDT) Message-ID: <33854911.post@talk.nabble.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Return-path: Received: from sam.nabble.com ([216.139.236.26]:54729 "EHLO sam.nabble.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966148Ab2EPACg (ORCPT ); Tue, 15 May 2012 20:02:36 -0400 Received: from telerig.nabble.com ([192.168.236.162]) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1SURhM-000792-0m for linux-serial@vger.kernel.org; Tue, 15 May 2012 17:02:36 -0700 Sender: linux-serial-owner@vger.kernel.org List-Id: linux-serial@vger.kernel.org To: linux-serial@vger.kernel.org Hi, It's been more than a week since I am stuck with this. I am relatively a newbie venturing into linux kernel, so please be gentle. I have a non-tained 2.6.35.14 kernel that needs to output a lot using the serial console. To reproduce I simply run klogd -2 -c 7 and this causes rcu_sched_state to detect stall with a trace such as pasted below. Please note that I had enabled CONFIG_PROVE_RCU, CONFIG_RCU_TRACE and related flags as well as bumped RCU_STALL_RAT_DELAY to 20 while debugging. This makes RCU wait 15020 jiffies until it complains. Otherwise, it complains in 10002 jiffies (with default config). The trace points to the wait_for_xmitr() loop in 8250.c (from another run): (gdb) list *(wait_for_xmitr+0x56) 0x5ae is in wait_for_xmitr (drivers/serial/8250.c:1863). 1858 { 1859 unsigned int status, tmout = 10000; 1860 1861 /* Wait up to 10ms for the character(s) to be sent. */ 1862 do { 1863 status = serial_in(up, UART_LSR); 1864 1865 up->lsr_saved_flags |= status & LSR_SAVE_FLAGS; 1866 1867 if (--tmout == 0) (gdb) AFAICT, this piece of code is run with interrupts disabled. So is the issue here simply that too much output is causing wait_for_xmitr() loop to run for longer than rcu_sched_state() is happy with? In rcu/stallwarn.txt, Paul notes that rcu_sched_state() will get unhappy should the CPU loop with interrupts disabled. IF this is indeed the case, what would be a potential fix for this? Any help/guidance much appreciated. Thank you! [ 346.664471] INFO: rcu_sched_state detected stalls on CPUs/tasks: { 5} (detected by 0, t=15020 jiffies) [ 346.664477] sending NMI to all CPUs: [ 346.666532] NMI backtrace for cpu 5 [ 346.666534] CPU 5 [ 346.666535] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler iscsi_scst crc32c_intel iptable_filter xt_NOTRACK iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_raw ip_tables x_tables bonding qla2x00tgt qla2xxx scsi_transport_fc scst_vdisk scst vtms softdog coretemp hwmon dm_mod video backlight output serio_raw rtc_cmos rtc_core rtc_lib button ixgbe i2c_i801 i2c_core uhci_hcd pcspkr [ 346.666559] [ 346.666562] Pid: 0, comm: swapper Not tainted 2.6.35-14EIsmp X8DTH/X8DTH-i/6/iF/6F [ 346.666565] RIP: 0010:[] [] io_serial_in+0x10/0x15 [ 346.666571] RSP: 0018:ffff880144223a70 EFLAGS: 00000002 [ 346.666574] RAX: 0000000000000900 RBX: ffffffff82a5c8c0 RCX: 0000000000000000 [ 346.666576] RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff82a5c8c0 [ 346.666579] RBP: ffff880144223a70 R08: 0000000098a41ff0 R09: ffffffff81293831 [ 346.666581] R10: ffffffff81d4db2b R11: 0000000000000000 R12: 0000000000000000 [ 346.666583] R13: 00000000000025af R14: 0000000000000020 R15: 000000000000003e [ 346.666586] FS: 0000000000000000(0000) GS:ffff880144220000(0000) knlGS:0000000000000000 [ 346.666589] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 346.666591] CR2: 00007fff5f13d464 CR3: 0000000074531000 CR4: 00000000000006e0 [ 346.666594] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 346.666596] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 346.666599] Process swapper (pid: 0, threadinfo ffff88013fb44000, task ffff88023fb1e000) [ 346.666600] Stack: [ 346.666602] ffff880144223aa0 ffffffff81291b6d ffffffff82a5c8c0 000000000000000a [ 346.666605] <0> ffffffff81291bd6 ffffffff82a5c8c0 ffff880144223ac0 ffffffff81291bf5 [ 346.666608] <0> ffffffff81d4dc59 000000000000003e ffff880144223b00 ffffffff8128e79b [ 346.666612] Call Trace: [ 346.666614] [ 346.666617] [] wait_for_xmitr+0x38/0xa1 [ 346.666621] [] ? serial8250_console_putchar+0x0/0x34 [ 346.666624] [] serial8250_console_putchar+0x1f/0x34 [ 346.666627] [] uart_console_write+0x41/0x52 [ 346.666631] [] serial8250_console_write+0xb2/0x116 [ 346.666635] [] __call_console_drivers+0x67/0x79 [ 346.666638] [] _call_console_drivers+0x5b/0x5f [ 346.666641] [] release_console_sem+0x12a/0x1c5 [ 346.666644] [] vprintk+0x373/0x3a3 [ 346.666648] [] ? printk+0x67/0x69 [ 346.666651] [] printk+0x67/0x69 [ 346.666655] [] ? __delay+0xa/0xc [ 346.666659] [] ? up+0xf/0x39 [ 346.666663] [] ? trace_hardirqs_on+0xd/0xf [ 346.666692] [] ? debug_print_prefix+0x135/0x146 [scst] [ 346.666706] [] scst_process_redirect_cmd+0x97/0x323 [scst] [ 346.666720] [] scst_cmd_done_local+0x227/0x280 [scst] [ 346.666726] [] blockio_endio+0xcf/0xee [scst_vdisk] [ 346.666733] [] ? end_io_acct+0x0/0x147 [dm_mod] [ 346.666736] [] bio_endio+0x26/0x28 [ 346.666742] [] dec_pending+0x15a/0x163 [dm_mod] [ 346.666748] [] clone_endio+0xa1/0xae [dm_mod] [ 346.666752] [] bio_endio+0x26/0x28 [ 346.666759] [] vtms_softirq_done+0xeb/0x107 [vtms] [ 346.666764] [] blk_done_softirq+0x6d/0x7e [ 346.666767] [] __do_softirq+0xe9/0x1c7 [ 346.666771] [] call_softirq+0x1c/0x34 [ 346.666774] [] do_softirq+0x38/0x83 [ 346.666777] [] irq_exit+0x45/0x47 [ 346.666780] [] smp_call_function_single_interrupt+0x2f/0x31 [ 346.666784] [] call_function_single_interrupt+0x13/0x20 [ 346.666786] -- View this message in context: http://old.nabble.com/Excessive-logging-causing-rcu_sched_state-stall-%28wait_for_xmitr-8250.c%29-tp33854911p33854911.html Sent from the linux-serial mailing list archive at Nabble.com.