From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hannes Reinecke Subject: Re: 3.15-rc4: circular locking dependency triggered by dm-multipath Date: Mon, 26 May 2014 14:10:19 +0200 Message-ID: <53832F2B.9060808@suse.de> References: <5368ABE8.9050909@acm.org> <536B1CC1.6080709@suse.de> <53832919.8080500@acm.org> Reply-To: device-mapper development Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <53832919.8080500@acm.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: Bart Van Assche Cc: device-mapper development , Mike Snitzer List-Id: dm-devel.ids On 05/26/2014 01:44 PM, Bart Van Assche wrote: > On 05/08/14 07:57, Hannes Reinecke wrote: >> On 05/06/2014 11:31 AM, Bart Van Assche wrote: >>> Hello, >>> >>> Has anyone else perhaps already run into this ? >>> >>> Thanks, >>> >>> Bart. >>> >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D >>> [ INFO: possible circular locking dependency detected ] >>> 3.15.0-rc4-debug+ #1 Not tainted >>> ------------------------------------------------------- >>> multipathd/10364 is trying to acquire lock: >>> (&(&q->__queue_lock)->rlock){-.-...}, at: [] >>> dm_table_run_md_queue_async+0x33/0x60 [dm_mod] >>> >>> but task is already holding lock: >>> (&(&m->lock)->rlock){..-...}, at: [] >>> queue_if_no_path+0x27/0xc0 [dm_multipath] >>> >>> which lock already depends on the new lock. >>> >>> >>> the existing dependency chain (in reverse order) is: >>> >>> -> #1 (&(&m->lock)->rlock){..-...}: >>> [] lock_acquire+0x93/0x1c0 >>> [] _raw_spin_lock+0x3b/0x50 >>> [] dm_blk_open+0x19/0x80 [dm_mod] >>> [] __blkdev_get+0xd1/0x4c0 >>> [] blkdev_get+0x1e5/0x380 >>> [] blkdev_open+0x5b/0x80 >>> [] do_dentry_open.isra.15+0x1de/0x2a0 >>> [] finish_open+0x30/0x40 >>> [] do_last.isra.61+0xa5d/0x1200 >>> [] path_openat+0xb7/0x620 >>> [] do_filp_open+0x3a/0x90 >>> [] do_sys_open+0x12e/0x210 >>> [] SyS_open+0x1e/0x20 >>> [] tracesys+0xd0/0xd5 >>> >>> -> #0 (&(&q->__queue_lock)->rlock){-.-...}: >>> [] __lock_acquire+0x1716/0x1a00 >>> [] lock_acquire+0x93/0x1c0 >>> [] _raw_spin_lock_irqsave+0x46/0x60 >>> [] dm_table_run_md_queue_async+0x33/0x60 >>> [dm_mod] >>> [] queue_if_no_path+0x72/0xc0 [dm_multipath] >>> [] multipath_presuspend+0x19/0x20 >>> [dm_multipath] >>> [] dm_table_presuspend_targets+0x4a/0x60 >>> [dm_mod] >>> [] dm_suspend+0x6d/0x1f0 [dm_mod] >>> [] dev_suspend+0x1c3/0x220 [dm_mod] >>> [] ctl_ioctl+0x269/0x500 [dm_mod] >>> [] dm_ctl_ioctl+0x13/0x20 [dm_mod] >>> [] do_vfs_ioctl+0x300/0x520 >>> [] SyS_ioctl+0x41/0x80 >>> [] tracesys+0xd0/0xd5 >>> >>> other info that might help us debug this: >>> >>> Possible unsafe locking scenario: >>> >>> CPU0 CPU1 >>> ---- ---- >>> lock(&(&m->lock)->rlock); >>> lock(&(&q->__queue_lock)->rlock); >>> lock(&(&m->lock)->rlock); >>> lock(&(&q->__queue_lock)->rlock); >>> >>> *** DEADLOCK *** >>> >>> 2 locks held by multipathd/10364: >>> #0: (&md->suspend_lock){+.+...}, at: [] >>> dm_suspend+0x38/0x1f0 [dm_mod] >>> #1: (&(&m->lock)->rlock){..-...}, at: [] >>> queue_if_no_path+0x27/0xc0 [dm_multipath] >>> >>> stack backtrace: >>> CPU: 10 PID: 10364 Comm: multipathd Not tainted 3.15.0-rc4-debug+ #1 >>> Hardware name: MSI MS-7737/Big Bang-XPower II (MS-7737), BIOS V1.5 >>> 10/16/2012 >>> ffffffff81fea150 ffff8807c194fa98 ffffffff814a6780 ffffffff81fea150 >>> ffff8807c194fad8 ffffffff814a36db ffff8807c194fb30 ffff8807c1954c88 >>> 0000000000000001 0000000000000002 ffff8807c1954c88 ffff8807c1954440 >>> Call Trace: >>> [] dump_stack+0x4e/0x7a >>> [] print_circular_bug+0x200/0x20f >>> [] __lock_acquire+0x1716/0x1a00 >>> [] lock_acquire+0x93/0x1c0 >>> [] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod] >>> [] _raw_spin_lock_irqsave+0x46/0x60 >>> [] ? dm_table_run_md_queue_async+0x33/0x60 [dm_mod] >>> [] dm_table_run_md_queue_async+0x33/0x60 [dm_mod] >>> [] queue_if_no_path+0x72/0xc0 [dm_multipath] >>> [] multipath_presuspend+0x19/0x20 [dm_multipath] >>> [] dm_table_presuspend_targets+0x4a/0x60 [dm_mod] >>> [] dm_suspend+0x6d/0x1f0 [dm_mod] >>> [] dev_suspend+0x1c3/0x220 [dm_mod] >>> [] ? table_load+0x350/0x350 [dm_mod] >>> [] ctl_ioctl+0x269/0x500 [dm_mod] >>> [] dm_ctl_ioctl+0x13/0x20 [dm_mod] >>> [] do_vfs_ioctl+0x300/0x520 >>> [] ? __fget+0x129/0x300 >>> [] ? __fget+0x5/0x300 >>> [] ? __fget_light+0x30/0x160 >>> [] SyS_ioctl+0x41/0x80 >>> [] tracesys+0xd0/0xd5 >>> >> Hmm. No, I haven't seen it (yet). >> Looking into it. > > Hello Hannes, > > Not sure if this information helps but apparently the above warning is > not a false positive but indicates a real deadlock. I have been able to > reproduce the above warning with 3.15-rc7. Shortly after the above > message appeared the following was reported: > > Watchdog detected hard LOCKUP on cpu 11 > Call Trace: > [] dump_stack+0x4e/0x7a > [] warn_slowpath_common+0x7d/0xa0 > [] warn_slowpath_fmt+0x4c/0x50 > [] watchdog_overflow_callback+0xd5/0x120 > [] __perf_event_overflow+0x9c/0x320 > [] ? x86_perf_event_set_period+0xe0/0x150 > [] perf_event_overflow+0x14/0x20 > [] intel_pmu_handle_irq+0x1ce/0x3c0 > [] ? _raw_spin_unlock+0x31/0x50 > [] ? ghes_copy_tofrom_phys+0x11f/0x210 > [] perf_event_nmi_handler+0x2b/0x50 > [] nmi_handle.isra.5+0xc2/0x380 > [] ? nmi_handle.isra.5+0x5/0x380 > [] do_nmi+0x185/0x3e0 > [] end_repeat_nmi+0x1e/0x2e > [] ? delay_tsc+0x2d/0xc0 > [] ? delay_tsc+0x2d/0xc0 > [] ? delay_tsc+0x2d/0xc0 > <> [] __delay+0xf/0x20 > [] do_raw_spin_lock+0xe1/0x140 > [] _raw_spin_lock_irqsave+0x4e/0x60 > [] ? multipath_busy+0x20/0xf0 [dm_multipath] > [] multipath_busy+0x20/0xf0 [dm_multipath] > [] dm_request_fn+0xcc/0x320 [dm_mod] > [] __blk_run_queue+0x33/0x40 > [] blk_delay_work+0x25/0x40 > [] process_one_work+0x1ea/0x6c0 > [] ? process_one_work+0x188/0x6c0 > [] worker_thread+0x11b/0x3a0 > [] ? process_one_work+0x6c0/0x6c0 > [] kthread+0xed/0x110 > [] ? insert_kthread_work+0x80/0x80 > [] ret_from_fork+0x7c/0xb0 > [] ? insert_kthread_work+0x80/0x80 > Mike Snitzer had a patch in his device-mapper tree: dm mpath: fix lock order inconsistency in multipath_ioctl = (2014-05-14 16:12:17 -0400) I was sort of hoping that would address this issue. Can you check? Cheers, Hannes -- = Dr. Hannes Reinecke zSeries & Storage hare@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg GF: J. Hawn, J. Guild, F. Imend=F6rffer, HRB 16746 (AG N=FCrnberg)