From mboxrd@z Thu Jan 1 00:00:00 1970 From: Theodore Ts'o Subject: RCU related deadlock in AIO? (in 3.18-rc2) Date: Wed, 29 Oct 2014 07:06:39 -0400 Message-ID: Mime-Version: 1.0 Content-Type: text/plain Cc: "Paul E. McKenney" , Zach Brown , Benjamin LaHaise To: linux-ext4@vger.kernel.org, linux-aio@kvack.org Return-path: Received: from imap.thunk.org ([74.207.234.97]:50380 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932259AbaJ2LGq (ORCPT ); Wed, 29 Oct 2014 07:06:46 -0400 Sender: linux-ext4-owner@vger.kernel.org List-ID: I have a vague memory of seeing discussion about a stack trace that looks similar to this in the past few weeks. Does this ring a bell at all? The following hang happened while running xfstests generic/323 under ext4 in no-journal mode, running on a 3.18-rc2 kernel. This is a recently added test, to deal with a proble which I thought had already been fixed in mainline. # Run aio-last-ref-held-by-io - last put of ioctx not in process # context. We've had a couple of instances in the past where having the # last reference to an ioctx be held by the IO (instead of the # process) would cause problems (hung system, crashes). Thanks, - Ted generic/323 123s ... [06:42:18][30020.393779] ------------[ cut here ]------------ [30020.394962] WARNING: CPU: 1 PID: 0 at /usr/projects/linux/ext4/lib/percpu-refcount.c:151 percpu_ref_switch_to_atomic_rcu+0x9f/0xbc() [30020.397073] percpu ref (free_ioctx_reqs) <= 0 (-65534) after switching to atomic [30020.397073] Modules linked in: [30020.397073] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.18.0-rc2-00005-gb790865 #2277 [30020.401678] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [30020.402522] 00000000 00000000 f553ff28 c088aba8 f553ff54 f553ff44 c0170af6 00000097 [30020.406004] c046cf41 df614874 ca3cd9f8 7fff0004 f553ff5c c0170b38 00000009 f553ff54 [30020.408552] c0b9dcf8 f553ff70 f553ff88 c046cf41 c0b9dc8d 00000097 c0b9dcf8 c028233d [30020.410484] Call Trace: [30020.410828] [] dump_stack+0x48/0x60 [30020.411578] [] warn_slowpath_common+0x6c/0x83 [30020.412701] [] ? percpu_ref_switch_to_atomic_rcu+0x9f/0xbc [30020.413541] [] warn_slowpath_fmt+0x2b/0x2f [30020.413715] [] percpu_ref_switch_to_atomic_rcu+0x9f/0xbc [30020.414267] [] ? rcu_lock_acquire+0x1c/0x1c [30020.415101] [] rcu_process_callbacks+0x2b5/0x433 [30020.415763] [] ? rcu_process_callbacks+0x2b5/0x433 [30020.417024] [] ? percpu_ref_reinit+0x50/0x50 [30020.417273] [] ? __local_bh_disable_ip+0x6e/0x6e [30020.417959] [] __do_softirq+0x16d/0x376 [30020.418688] [] ? __local_bh_disable_ip+0x6e/0x6e [30020.419324] [] do_softirq_own_stack+0x26/0x2c [30020.420372] [] irq_exit+0x42/0x8f [30020.420796] [] do_IRQ+0x89/0x9d [30020.421606] [] common_interrupt+0x31/0x38 [30020.422162] [] ? add_atomic_switch_msr+0x173/0x173 [30020.422850] [] ? native_safe_halt+0x5/0x7 [30020.423696] [] default_idle+0x22/0x4b [30020.424321] [] arch_cpu_idle+0xe/0x10 [30020.424967] [] cpu_startup_entry+0x130/0x250 [30020.425604] [] start_secondary+0x15d/0x166 [30020.426333] ---[ end trace 4f7947d25059a4b4 ]--- [30240.260397] INFO: task aio-last-ref-he:9183 blocked for more than 120 seconds.[30240.261418] Tainted: G W 3.18.0-rc2-00005-gb790865 #2277 [30240.262399] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [30240.263575] d7e99e78 00000046 c0f6b280 c0f6b280 a9372026 00001b4d cdfc5800 00001b4d [30240.264979] d9758350 f6250880 d7e99e60 c0192666 000396f0 00000000 a9371bab 00001b4d [30240.266194] a9371cc6 00001b4d a9371bab 00001b4d d97588c4 c0f6b880 00000001 d7e99e8c [30240.267449] Call Trace: [30240.267795] [] ? sched_clock_local+0x10/0x10e [30240.268515] [] ? sched_clock_cpu+0xd1/0xec [30240.269207] [] ? sched_clock_local+0x10/0x10e [30240.269929] [] schedule+0x5a/0x5c [30240.270530] [] schedule_timeout+0x1a/0x9c [30240.271245] [] ? local_clock+0x18/0x22 [30240.271890] [] ? lock_release_holdtime.part.25+0x60/0x6d [30240.272736] [] ? _raw_spin_unlock_irq+0x27/0x36 [30240.273480] [] ? trace_hardirqs_on_caller+0x15f/0x17a [30240.274329] [] ? trace_hardirqs_on+0xb/0xd [30240.275054] [] ? _raw_spin_unlock_irq+0x2c/0x36 [30240.275798] [] __wait_for_common+0xcb/0xf5 [30240.276485] [] ? console_conditional_schedule+0x29/0x29 [30240.277343] [] ? wake_up_state+0x11/0x11 [30240.278016] [] wait_for_completion+0x19/0x1c [30240.278727] [] SYSC_io_destroy+0x7a/0xaf [30240.279409] [] ? __wait_for_common+0x31/0xf5 [30240.280116] [] SyS_io_destroy+0x10/0x12 [30240.280815] [] ? trace_hardirqs_on_thunk+0xc/0x10 [30240.281583] [] ? SyS_io_destroy+0x10/0x12 [30240.282271] [] syscall_call+0x7/0x7 [30240.282881] [] ? schedule_preempt_disabled+0xd/0x1d [30240.283692] no locks held by aio-last-ref-he/9183.