From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zdenek Kabelac Subject: Re: [PATCH] dm-snapshot: workaround for a lockdep warning Date: Wed, 18 Sep 2013 08:57:29 +0200 Message-ID: <52394ED9.5010308@redhat.com> References: Reply-To: device-mapper development Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: Mikulas Patocka Cc: dm-devel@redhat.com, Stephen Boyd , "Alasdair G. Kergon" List-Id: dm-devel.ids Dne 18.9.2013 02:08, Mikulas Patocka napsal(a): > > > On Tue, 17 Sep 2013, Mikulas Patocka wrote: > >> BTW. this patch is no longer needed on 3.11 because md->io_lock was >> removed there, so the false lockdep warning doesn't happen. >> >> So we need to get this patch to stable trees 3.5 to 3.10, but not >> upstream. How to push the patch there without pushing it upstream? >> >> Mikulas > > It's more complicated than that - the original Zdenek's report shows that > the false warning happened on kernel "3.11.1-300.fc20.x86_64+debug". But I > am not able to reproduce the bug on 3.11, I only reproduced it on 3.5-3.10 > kernels. > > Zdenek, could you reproduce the bug on kernel 3.11? > > Mikulas > Since I do not have clean 3.11 at my hands - I've reproduced it with my -rc6 kernel (few patches based on 8495e9c4a9616c9d19f23182d0536485902259db) With enabled kmemleak=on lockdep.prove_locking=1 lockdep.lock_stat=1. [ 173.077483] device-mapper: snapshots: Invalidating snapshot: Unable to allocate exception. [ 173.077928] ====================================================== [ 173.077930] [ INFO: possible circular locking dependency detected ] [ 173.077934] 3.11.0-rc6-00153-gf4c4c6a #160 Not tainted [ 173.077936] ------------------------------------------------------- [ 173.077939] kworker/u4:3/50 is trying to acquire lock: [ 173.077941] ((&req.work)){+.+...}, at: [] flush_work+0x5/0x60 [ 173.077954] but task is already holding lock: [ 173.077957] (&s->lock){++++..}, at: [] snapshot_map+0x2aa/0x3a0 [dm_snapshot] [ 173.077968] which lock already depends on the new lock. [ 173.077971] the existing dependency chain (in reverse order) is: [ 173.077974] -> #1 (&s->lock){++++..}: [ 173.077980] [] lock_acquire+0x93/0x200 [ 173.077986] [] _raw_spin_lock+0x40/0x80 [ 173.077991] [] __slab_alloc.constprop.67+0x141/0x50d [ 173.077996] [] kmem_cache_alloc+0x252/0x290 [ 173.078001] [] create_object+0x39/0x310 [ 173.078006] [] kmemleak_alloc+0x4e/0xb0 [ 173.078011] [] kmem_cache_alloc_trace+0x105/0x2a0 [ 173.078015] [] sysfs_open_file+0x1e2/0x240 [ 173.078021] [] do_dentry_open+0x1fb/0x290 [ 173.078025] [] finish_open+0x40/0x50 [ 173.078029] [] do_last+0x4ca/0xe00 [ 173.078033] [] path_openat+0xbe/0x6f0 [ 173.078037] [] do_filp_open+0x3a/0x90 [ 173.078040] [] do_sys_open+0x12e/0x210 [ 173.078045] [] SyS_open+0x1e/0x20 [ 173.078049] [] system_call_fastpath+0x1a/0x1f [ 173.078053] -> #0 ((&req.work)){+.+...}: [ 173.078060] [] __lock_acquire+0x1790/0x1af0 [ 173.078063] [] lock_acquire+0x93/0x200 [ 173.078067] [] flush_work+0x36/0x60 [ 173.078071] [] chunk_io+0x118/0x150 [dm_snapshot] [ 173.078076] [] write_header+0x59/0x60 [dm_snapshot] [ 173.078081] [] persistent_drop_snapshot+0x19/0x30 [dm_snapshot] [ 173.078086] [] __invalidate_snapshot.part.13+0x2c/0x70 [dm_snapshot] [ 173.078091] [] snapshot_map+0x127/0x3a0 [dm_snapshot] [ 173.078096] [] __map_bio+0x3e/0x1f0 [dm_mod] [ 173.078104] [] __clone_and_map_data_bio+0x150/0x230 [dm_mod] [ 173.078109] [] __split_and_process_bio+0x38c/0x5a0 [dm_mod] [ 173.078115] [] dm_request+0x1b9/0x2f0 [dm_mod] [ 173.078121] [] generic_make_request+0xc2/0x110 [ 173.078127] [] submit_bio+0x62/0x130 [ 173.078131] [] __mpage_writepage+0x448/0x680 [ 173.078136] [] write_cache_pages+0x27b/0x630 [ 173.078142] [] mpage_writepages+0x5a/0xa0 [ 173.078146] [] ext2_writepages+0x15/0x20 [ 173.078151] [] do_writepages+0x21/0x50 [ 173.078155] [] __writeback_single_inode+0x40/0x600 [ 173.078160] [] writeback_sb_inodes+0x270/0x570 [ 173.078164] [] __writeback_inodes_wb+0x9f/0xd0 [ 173.078168] [] wb_writeback+0x373/0x5a0 [ 173.078172] [] bdi_writeback_workfn+0x134/0x6d0 [ 173.078176] [] process_one_work+0x1fd/0x6f0 [ 173.078180] [] worker_thread+0x123/0x3b0 [ 173.078184] [] kthread+0xde/0xf0 [ 173.078188] [] ret_from_fork+0x7c/0xb0 [ 173.078192] other info that might help us debug this: [ 173.078196] Possible unsafe locking scenario: [ 173.078198] CPU0 CPU1 [ 173.078200] ---- ---- [ 173.078202] lock(&s->lock); [ 173.078207] lock((&req.work)); [ 173.078211] lock(&s->lock); [ 173.078215] lock((&req.work)); [ 173.078219] *** DEADLOCK *** [ 173.078222] 5 locks held by kworker/u4:3/50: [ 173.078225] #0: (writeback){++++.+}, at: [] process_one_work+0x19b/0x6f0 [ 173.078233] #1: ((&(&wb->dwork)->work)){+.+.+.}, at: [] process_one_work+0x19b/0x6f0 [ 173.078242] #2: (&type->s_umount_key#30){++++..}, at: [] grab_super_passive+0x44/0x90 [ 173.078252] #3: (&md->io_barrier){.+.+..}, at: [] dm_get_live_table+0x5/0xc0 [dm_mod] [ 173.078263] #4: (&s->lock){++++..}, at: [] snapshot_map+0x2aa/0x3a0 [dm_snapshot] [ 173.078273] stack backtrace: [ 173.078277] CPU: 0 PID: 50 Comm: kworker/u4:3 Not tainted 3.11.0-rc6-00153-gf4c4c6a #160 [ 173.078280] Hardware name: LENOVO 6464CTO/6464CTO, BIOS 7LETC9WW (2.29 ) 03/18/2011 [ 173.078284] Workqueue: writeback bdi_writeback_workfn (flush-253:3) [ 173.078289] ffffffff82369540 ffff8801305732f8 ffffffff815869ad ffffffff82369540 [ 173.078295] ffff880130573338 ffffffff815827ab ffff880130573390 ffff880130ab4ef8 [ 173.078301] 0000000000000004 ffff880130ab4ec0 ffff880130ab46e0 ffff880130ab4ef8 [ 173.078307] Call Trace: [ 173.078312] [] dump_stack+0x4e/0x82 [ 173.078317] [] print_circular_bug+0x200/0x20f [ 173.078322] [] __lock_acquire+0x1790/0x1af0 [ 173.078326] [] ? trace_hardirqs_off_caller+0x1f/0xc0 [ 173.078330] [] ? put_lock_stats.isra.27+0x29/0x40 [ 173.078334] [] lock_acquire+0x93/0x200 [ 173.078338] [] ? flush_work+0x5/0x60 [ 173.078342] [] flush_work+0x36/0x60 [ 173.078346] [] ? flush_work+0x5/0x60 [ 173.078351] [] chunk_io+0x118/0x150 [dm_snapshot] [ 173.078357] [] ? persistent_prepare_exception+0xa0/0xa0 [dm_snapshot] [ 173.078363] [] write_header+0x59/0x60 [dm_snapshot] [ 173.078368] [] persistent_drop_snapshot+0x19/0x30 [dm_snapshot] [ 173.078373] [] __invalidate_snapshot.part.13+0x2c/0x70 [dm_snapshot] [ 173.078378] [] snapshot_map+0x127/0x3a0 [dm_snapshot] [ 173.078384] [] __map_bio+0x3e/0x1f0 [dm_mod] [ 173.078391] [] __clone_and_map_data_bio+0x150/0x230 [dm_mod] [ 173.078397] [] __split_and_process_bio+0x38c/0x5a0 [dm_mod] [ 173.078404] [] dm_request+0x1b9/0x2f0 [dm_mod] [ 173.078410] [] ? dm_request+0x35/0x2f0 [dm_mod] [ 173.078414] [] generic_make_request+0xc2/0x110 [ 173.078419] [] submit_bio+0x62/0x130 [ 173.078423] [] __mpage_writepage+0x448/0x680 [ 173.078429] [] ? get_parent_ip+0xd/0x50 [ 173.078433] [] ? mark_held_locks+0xbb/0x140 [ 173.078438] [] ? clear_page_dirty_for_io+0xac/0xe0 [ 173.078442] [] ? trace_hardirqs_on_caller+0x115/0x1e0 [ 173.078446] [] write_cache_pages+0x27b/0x630 [ 173.078450] [] ? check_irq_usage+0x83/0xc0 [ 173.078455] [] ? mpage_writepages+0xa0/0xa0 [ 173.078459] [] ? ext2_get_blocks+0xa90/0xa90 [ 173.078463] [] mpage_writepages+0x5a/0xa0 [ 173.078467] [] ? ext2_get_blocks+0xa90/0xa90 [ 173.078472] [] ext2_writepages+0x15/0x20 [ 173.078476] [] do_writepages+0x21/0x50 [ 173.078481] [] __writeback_single_inode+0x40/0x600 [ 173.078485] [] writeback_sb_inodes+0x270/0x570 [ 173.078489] [] __writeback_inodes_wb+0x9f/0xd0 [ 173.078494] [] wb_writeback+0x373/0x5a0 [ 173.078498] [] bdi_writeback_workfn+0x134/0x6d0 [ 173.078503] [] ? process_one_work+0x19b/0x6f0 [ 173.078507] [] process_one_work+0x1fd/0x6f0 [ 173.078511] [] ? process_one_work+0x19b/0x6f0 [ 173.078515] [] worker_thread+0x123/0x3b0 [ 173.078519] [] ? process_one_work+0x6f0/0x6f0 [ 173.078523] [] kthread+0xde/0xf0 [ 173.078529] [] ? insert_kthread_work+0x70/0x70 [ 173.078532] [] ret_from_fork+0x7c/0xb0 [ 173.078537] [] ? insert_kthread_work+0x70/0x70 [ 173.120355] quiet_error: 1206 callbacks suppressed [ 173.120362] Buffer I/O error on device dm-3, logical block 34