linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Hard lockup in ext4_finish_bio
@ 2015-10-08 11:46 Nikolay Borisov
  2015-10-08 14:15 ` Nikolay Borisov
  2015-10-08 14:34 ` John Stoffel
  0 siblings, 2 replies; 6+ messages in thread
From: Nikolay Borisov @ 2015-10-08 11:46 UTC (permalink / raw)
  To: Linux-Kernel@Vger. Kernel. Org
  Cc: linux-ext4, Theodore Ts'o, Marian Marinov, Jan Kara

Hello, 

I've hit a rather strange hard lock up on one of my servers from the 
page writeback path, the actual backtrace is: 

[427149.717151] ------------[ cut here ]------------
[427149.717553] WARNING: CPU: 23 PID: 4611 at kernel/watchdog.c:245 watchdog_overflow_callback+0x98/0xc0()
[427149.718216] Watchdog detected hard LOCKUP on cpu 23
[427149.718292] Modules linked in:
[427149.718723]  tcp_diag inet_diag netconsole act_police cls_basic sch_ingress xt_pkttype xt_state veth openvswitch gre 
vxlan ip_tunnel xt_owner xt_conntrack iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_n
at xt_CT nf_conntrack iptable_raw ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ext2 
dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log i2c_i801 lpc_ich mfd_core shpchp i
oapic ioatdma igb i2c_algo_bit ses enclosure ipmi_devintf ipmi_si ipmi_msghandler ib_qib dca ib_mad ib_core
[427149.725321] CPU: 23 PID: 4611 Comm: kworker/u98:7 Not tainted 3.12.47-clouder1 #1
[427149.725690] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
[427149.726062] Workqueue: writeback bdi_writeback_workfn (flush-252:148)
[427149.726564]  00000000000000f5 ffff883fff366b58 ffffffff81651631 00000000000000f5
[427149.727212]  ffff883fff366ba8 ffff883fff366b98 ffffffff81089a6c 0000000000000000
[427149.727860]  ffff883fd2f08000 0000000000000000 ffff883fff366ce8 0000000000000000
[427149.728490] Call Trace:
[427149.728845]  <NMI>  [<ffffffff81651631>] dump_stack+0x58/0x7f
[427149.729350]  [<ffffffff81089a6c>] warn_slowpath_common+0x8c/0xc0
[427149.729712]  [<ffffffff81089b56>] warn_slowpath_fmt+0x46/0x50
[427149.730076]  [<ffffffff811015f8>] watchdog_overflow_callback+0x98/0xc0
[427149.730443]  [<ffffffff81132d0c>] __perf_event_overflow+0x9c/0x250
[427149.730810]  [<ffffffff81133664>] perf_event_overflow+0x14/0x20
[427149.731175]  [<ffffffff81061796>] intel_pmu_handle_irq+0x1d6/0x3e0
[427149.739656]  [<ffffffff8105b4c4>] perf_event_nmi_handler+0x34/0x60
[427149.740027]  [<ffffffff8104c152>] nmi_handle+0xa2/0x1a0
[427149.740389]  [<ffffffff8104c3b4>] do_nmi+0x164/0x430
[427149.740754]  [<ffffffff81656e2e>] end_repeat_nmi+0x1a/0x1e
[427149.741122]  [<ffffffff8113bfd7>] ? mempool_free_slab+0x17/0x20
[427149.741492]  [<ffffffff8125be15>] ? ext4_finish_bio+0x275/0x2a0
[427149.741854]  [<ffffffff8125be15>] ? ext4_finish_bio+0x275/0x2a0
[427149.742216]  [<ffffffff8125be15>] ? ext4_finish_bio+0x275/0x2a0
[427149.742579]  <<EOE>>  <IRQ>  [<ffffffff8125c2c8>] ext4_end_bio+0xc8/0x120
[427149.743150]  [<ffffffff811dbf1d>] bio_endio+0x1d/0x40
[427149.743516]  [<ffffffff81546781>] dec_pending+0x1c1/0x360
[427149.743878]  [<ffffffff81546996>] clone_endio+0x76/0xa0
[427149.744239]  [<ffffffff811dbf1d>] bio_endio+0x1d/0x40
[427149.744599]  [<ffffffff81546781>] dec_pending+0x1c1/0x360
[427149.744964]  [<ffffffff81546996>] clone_endio+0x76/0xa0
[427149.745326]  [<ffffffff811dbf1d>] bio_endio+0x1d/0x40
[427149.745686]  [<ffffffff81546781>] dec_pending+0x1c1/0x360
[427149.746048]  [<ffffffff81546996>] clone_endio+0x76/0xa0
[427149.746407]  [<ffffffff811dbf1d>] bio_endio+0x1d/0x40
[427149.746773]  [<ffffffff812fad2b>] blk_update_request+0x21b/0x450
[427149.747138]  [<ffffffff812faf87>] blk_update_bidi_request+0x27/0xb0
[427149.747513]  [<ffffffff812fcc7f>] blk_end_bidi_request+0x2f/0x80
[427149.748101]  [<ffffffff812fcd20>] blk_end_request+0x10/0x20
[427149.748705]  [<ffffffff813fdc1c>] scsi_io_completion+0xbc/0x620
[427149.749297]  [<ffffffff813f57f9>] scsi_finish_command+0xc9/0x130
[427149.749891]  [<ffffffff813fe2e7>] scsi_softirq_done+0x147/0x170
[427149.750491]  [<ffffffff813035ad>] blk_done_softirq+0x7d/0x90
[427149.751089]  [<ffffffff8108ed87>] __do_softirq+0x137/0x2e0
[427149.751694]  [<ffffffff81658a0c>] call_softirq+0x1c/0x30
[427149.752284]  [<ffffffff8104a35d>] do_softirq+0x8d/0xc0
[427149.752892]  [<ffffffff8108e925>] irq_exit+0x95/0xa0
[427149.753526]  [<ffffffff8106f755>] smp_call_function_single_interrupt+0x35/0x40
[427149.754149]  [<ffffffff8165826f>] call_function_single_interrupt+0x6f/0x80
[427149.754750]  <EOI>  [<ffffffff813276e6>] ? memcpy+0x6/0x110
[427149.755572]  [<ffffffff811dc6d6>] ? __bio_clone+0x26/0x70
[427149.756179]  [<ffffffff81546db9>] __clone_and_map_data_bio+0x139/0x160
[427149.756814]  [<ffffffff815471cd>] __split_and_process_bio+0x3ed/0x490
[427149.757444]  [<ffffffff815473a6>] dm_request+0x136/0x1e0
[427149.758041]  [<ffffffff812fbe0a>] generic_make_request+0xca/0x100
[427149.758641]  [<ffffffff812fbeb9>] submit_bio+0x79/0x160
[427149.759035]  [<ffffffff81144c3d>] ? account_page_writeback+0x2d/0x40
[427149.759406]  [<ffffffff81144dbd>] ? __test_set_page_writeback+0x16d/0x1f0
[427149.759781]  [<ffffffff8125b7a9>] ext4_io_submit+0x29/0x50
[427149.760151]  [<ffffffff8125b8fb>] ext4_bio_write_page+0x12b/0x2f0
[427149.760519]  [<ffffffff81252fe8>] mpage_submit_page+0x68/0x90
[427149.760887]  [<ffffffff81253100>] mpage_process_page_bufs+0xf0/0x110
[427149.761257]  [<ffffffff81254a80>] mpage_prepare_extent_to_map+0x210/0x310
[427149.761624]  [<ffffffff8125a911>] ? ext4_writepages+0x361/0xc60
[427149.761998]  [<ffffffff81283c09>] ? __ext4_journal_start_sb+0x79/0x110
[427149.762362]  [<ffffffff8125a948>] ext4_writepages+0x398/0xc60
[427149.762726]  [<ffffffff812fd358>] ? blk_finish_plug+0x18/0x50
[427149.763089]  [<ffffffff81146b40>] do_writepages+0x20/0x40
[427149.763454]  [<ffffffff811cec79>] __writeback_single_inode+0x49/0x2b0
[427149.763819]  [<ffffffff810aeeef>] ? wake_up_bit+0x2f/0x40
[427149.764182]  [<ffffffff811cfdee>] writeback_sb_inodes+0x2de/0x540
[427149.764544]  [<ffffffff811a6e65>] ? put_super+0x25/0x50
[427149.764903]  [<ffffffff811d00ee>] __writeback_inodes_wb+0x9e/0xd0
[427149.765265]  [<ffffffff811d035b>] wb_writeback+0x23b/0x340
[427149.765628]  [<ffffffff811d04f9>] wb_do_writeback+0x99/0x230
[427149.765996]  [<ffffffff810a40f1>] ? set_worker_desc+0x81/0x90
[427149.766364]  [<ffffffff810c7a6a>] ? dequeue_task_fair+0x36a/0x4c0
[427149.766734]  [<ffffffff811d0bf8>] bdi_writeback_workfn+0x88/0x260
[427149.767101]  [<ffffffff810bbb3e>] ? finish_task_switch+0x4e/0xe0
[427149.767473]  [<ffffffff81653dac>] ? __schedule+0x2dc/0x760
[427149.767845]  [<ffffffff810a61e5>] process_one_work+0x195/0x550
[427149.768212]  [<ffffffff810a848a>] worker_thread+0x13a/0x430
[427149.768575]  [<ffffffff810a8350>] ? manage_workers+0x2c0/0x2c0
[427149.768940]  [<ffffffff810ae48e>] kthread+0xce/0xe0
[427149.769303]  [<ffffffff810ae3c0>] ? kthread_freezable_should_stop+0x80/0x80
[427149.769668]  [<ffffffff816571c8>] ret_from_fork+0x58/0x90
[427149.770030]  [<ffffffff810ae3c0>] ? kthread_freezable_should_stop+0x80/0x80
[427149.770394] ---[ end trace 76566eb23cf1b028 ]---


What's not evident here is that in ext4_end_bio, the CPU has actually 
been executing ext4_finish_bio as seen from the following 
NMI backtrace: 

[427160.405277] NMI backtrace for cpu 23
[427160.405279] CPU: 23 PID: 4611 Comm: kworker/u98:7 Tainted: G        W    3.12.47-clouder1 #1
[427160.405281] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
[427160.405285] Workqueue: writeback bdi_writeback_workfn (flush-252:148)
[427160.405286] task: ffff8825aa819830 ti: ffff882b19180000 task.ti: ffff882b19180000
[427160.405290] RIP: 0010:[<ffffffff8125be13>]  [<ffffffff8125be13>] ext4_finish_bio+0x273/0x2a0 <-- Important bit
[427160.405291] RSP: 0000:ffff883fff3639b0  EFLAGS: 00000002
[427160.405292] RAX: ffff882b19180000 RBX: ffff883f67480a80 RCX: 0000000000000110
[427160.405292] RDX: ffff882b19180000 RSI: 0000000000000000 RDI: ffff883f67480a80
[427160.405293] RBP: ffff883fff363a70 R08: 0000000000014b80 R09: ffff881fff454f00
[427160.405294] R10: ffffea00473214c0 R11: ffffffff8113bfd7 R12: ffff880826272138
[427160.405295] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea00aeaea400
[427160.405296] FS:  0000000000000000(0000) GS:ffff883fff360000(0000) knlGS:0000000000000000
[427160.405296] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[427160.405297] CR2: 0000003c5b009c24 CR3: 0000000001c0b000 CR4: 00000000001407e0
[427160.405297] Stack:
[427160.405305]  0000000000000000 ffffffff8203f230 ffff883fff363a00 ffff882b19180000
[427160.405312]  ffff882b19180000 ffff882b19180000 00000400018e0af8 ffff882b19180000
[427160.405319]  ffff883f67480a80 0000000000000000 0000000000000202 00000000d219e720
[427160.405320] Call Trace:
[427160.405324]  <IRQ> 
	//SAME AS THE FIRST STACKTRACE
[427160.405436]  <EOI> 
	//SAME AS THE FIRST STACKTRACE
[427160.405540] Code: e8 a3 38 ec ff ff 75 90 9d e9 2a ff ff ff 0f 0b 0f 1f 84 00 00 00 00 00 eb f6 48 8b 85 60 ff ff ff 48 8b 95 58 ff ff ff ff 48 1c <f3> 90 49 8b 0c 24 80 e1 10 75 f5 ff 42 1c f0 41 0f ba 2c 24 04 

After sampling from several different NMI prints it turns out the CPU 
has looped on the following bit lock acquire: 

local_irq_save(flags);
bit_spin_lock(BH_Uptodate_Lock, &head->b_state);
do {
     if (bh_offset(bh) < bio_start ||
         bh_offset(bh) + bh->b_size > bio_end) {
         	if (buffer_async_write(bh))
                	under_io++;
                continue;
     }
     clear_buffer_async_write(bh);
     if (error)
     	buffer_io_error(bh);
} while ((bh = bh->b_this_page) != head);
bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
local_irq_restore(flags);

So far everything seems to check out - the code is spinning on 
acquiring the BH_Uptodate_Lock with interrupts disabled and 
this causes the deadlock. Now, the pertinent question is 
why can't it acquire the spinlock. Grepping for the users of
BH_Uptodate_Lock it is revealed that the only other places where 
they are used are in end_buffer_async_read and 
end_buffer_async_write. Looking at the NMI backtraces for the 
rest of the CPUs on the system I didn't find it in any 
execution path. Most of the cores were in acpi_idle_enter and 
one was in mem cgroup reclaim path. However I think those are 
unrelated. 

Ignoring the NMI call stack, this is by far the deepest callchain
I've ever seen, I wonder whether some sort of memory corruption has 
happened which confuses the code into thinking the lock is 
held. Here is the struct buffer_head:

crash> struct buffer_head ffff880826272138 -x

struct buffer_head {
  b_state = 0x131,  
  b_this_page = 0xffff8808262728f0, 
  b_page = 0xffffea00aeaea400, 
  b_blocknr = 0x36bea69, 
  b_size = 0x400, 
  b_data = 0xffff882baba90000 struct: page excluded: kernel virtual address: ffff882baba90000  type: "gdb_readmem_callback"
struct: page excluded: kernel virtual address: ffff882baba90000  type: "gdb_readmem_callback"
<Address 0xffff882baba90000 out of bounds>, 
  b_bdev = 0xffff883fcfb6db00, 
  b_end_io = 0x0, 
  b_private = 0x0, 
  b_assoc_buffers = {
    next = 0xffff880826272180, 
    prev = 0xffff880826272180
  }, 
  b_assoc_map = 0x0, 
  b_count = {
    counter = 0x0
  }
}

b_state = 0x131 = (0001 0011 0001) = BH_Uptodate, BG_Uptodate_Lock, 
BH_Mapped and BH_Async_Write, which implies that 
end_buffer_async_write should have been executing somewhere but 
I do not see this.  And in any case it hasn't completed in reasonable time
since we've hit the deadlock. At this point I'm inclined to believe 
that some sort of hardware issue might have reared its head?

What do you guys think? 

Regards, 
Nikolay

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hard lockup in ext4_finish_bio
  2015-10-08 11:46 Hard lockup in ext4_finish_bio Nikolay Borisov
@ 2015-10-08 14:15 ` Nikolay Borisov
  2015-10-08 14:34 ` John Stoffel
  1 sibling, 0 replies; 6+ messages in thread
From: Nikolay Borisov @ 2015-10-08 14:15 UTC (permalink / raw)
  To: Linux-Kernel@Vger. Kernel. Org
  Cc: device-mapper development, linux-ext4, Theodore Ts'o,
	Marian Marinov, Jan Kara, mpatocka

[Adding the dm people since I believe this might be in their field as well.]

After further investigation I discovered that there was indeed a task, 
executing block_write_full_page_endio (this function is supposed to 
eventually call end_buffer_async_write which would then unblock the 
writeback flusher. Here is how its callstack is looking like: 

PID: 34220  TASK: ffff883937660810  CPU: 44  COMMAND: "kworker/u98:39"
 #0 [ffff88209d5b10b8] __schedule at ffffffff81653d5a
 #1 [ffff88209d5b1150] schedule at ffffffff816542f9
 #2 [ffff88209d5b1160] schedule_preempt_disabled at ffffffff81654686
 #3 [ffff88209d5b1180] __mutex_lock_slowpath at ffffffff816521eb
 #4 [ffff88209d5b1200] mutex_lock at ffffffff816522d1
 #5 [ffff88209d5b1220] new_read at ffffffffa0152a7e [dm_bufio]
 #6 [ffff88209d5b1280] dm_bufio_get at ffffffffa0152ba6 [dm_bufio]
 #7 [ffff88209d5b1290] dm_bm_read_try_lock at ffffffffa015c878 [dm_persistent_data]
 #8 [ffff88209d5b12e0] dm_tm_read_lock at ffffffffa015f7ad [dm_persistent_data]
 #9 [ffff88209d5b12f0] bn_read_lock at ffffffffa016281b [dm_persistent_data]
#10 [ffff88209d5b1300] ro_step at ffffffffa0162856 [dm_persistent_data]
#11 [ffff88209d5b1320] dm_btree_lookup at ffffffffa01610e4 [dm_persistent_data]
#12 [ffff88209d5b13d0] dm_thin_find_block at ffffffffa017d20c [dm_thin_pool]
#13 [ffff88209d5b1430] thin_bio_map at ffffffffa017866b [dm_thin_pool]
#14 [ffff88209d5b1540] thin_map at ffffffffa0178825 [dm_thin_pool]
#15 [ffff88209d5b1550] __map_bio at ffffffff81546a0e
#16 [ffff88209d5b1590] __clone_and_map_data_bio at ffffffff81546d81
#17 [ffff88209d5b1600] __split_and_process_bio at ffffffff815471cd
#18 [ffff88209d5b16b0] dm_request at ffffffff815473a6
#19 [ffff88209d5b16e0] generic_make_request at ffffffff812fbe0a
#20 [ffff88209d5b1710] submit_bio at ffffffff812fbeb9
#21 [ffff88209d5b1780] _submit_bh at ffffffff811d78a8
#22 [ffff88209d5b17b0] submit_bh at ffffffff811d7990
#23 [ffff88209d5b17c0] __block_write_full_page at ffffffff811da7ef
#24 [ffff88209d5b1840] block_write_full_page_endio at ffffffff811daa0b
#25 [ffff88209d5b1890] block_write_full_page at ffffffff811daa65
#26 [ffff88209d5b18a0] blkdev_writepage at ffffffff811df658
#27 [ffff88209d5b18b0] __writepage at ffffffff81144567
#28 [ffff88209d5b18d0] write_cache_pages at ffffffff811467e3
#29 [ffff88209d5b1a20] generic_writepages at ffffffff81146af1
#30 [ffff88209d5b1a80] do_writepages at ffffffff81146b40
#31 [ffff88209d5b1a90] __writeback_single_inode at ffffffff811cec79
#32 [ffff88209d5b1ae0] writeback_sb_inodes at ffffffff811cfdee
#33 [ffff88209d5b1bc0] __writeback_inodes_wb at ffffffff811d00ee
#34 [ffff88209d5b1c10] wb_writeback at ffffffff811d035b
#35 [ffff88209d5b1cc0] wb_do_writeback at ffffffff811d062c
#36 [ffff88209d5b1d60] bdi_writeback_workfn at ffffffff811d0bf8
#37 [ffff88209d5b1df0] process_one_work at ffffffff810a61e5
#38 [ffff88209d5b1e40] worker_thread at ffffffff810a848a
#39 [ffff88209d5b1ec0] kthread at ffffffff810ae48e
#40 [ffff88209d5b1f50] ret_from_fork at ffffffff816571c8

So it in turn has also blocked on dm_bufio_lock in new_read. The owner of 
this mutex turns out to be: 

PID: 31111  TASK: ffff881cbb2fb870  CPU: 2   COMMAND: "kworker/u96:0"
 #0 [ffff881fffa46dc0] crash_nmi_callback at ffffffff8106f24e
 #1 [ffff881fffa46de0] nmi_handle at ffffffff8104c152
 #2 [ffff881fffa46e70] do_nmi at ffffffff8104c3b4
 #3 [ffff881fffa46ef0] end_repeat_nmi at ffffffff81656e2e
    [exception RIP: smp_call_function_many+577]
    RIP: ffffffff810e7f81  RSP: ffff880d35b815c8  RFLAGS: 00000202
    RAX: 0000000000000017  RBX: ffffffff81142690  RCX: 0000000000000017
    RDX: ffff883fff375478  RSI: 0000000000000040  RDI: 0000000000000040
    RBP: ffff880d35b81628   R8: ffff881fffa51ec8   R9: 0000000000000000
    R10: 0000000000000000  R11: ffffffff812943f3  R12: 0000000000000000
    R13: ffff881fffa51ec0  R14: ffff881fffa51ec8  R15: 0000000000011f00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff880d35b815c8] smp_call_function_many at ffffffff810e7f81
 #5 [ffff880d35b81630] on_each_cpu_mask at ffffffff810e801c
 #6 [ffff880d35b81660] drain_all_pages at ffffffff81140178
 #7 [ffff880d35b81690] __alloc_pages_nodemask at ffffffff8114310b
 #8 [ffff880d35b81810] alloc_pages_current at ffffffff81181c5e
 #9 [ffff880d35b81860] new_slab at ffffffff81188305
#10 [ffff880d35b818b0] __slab_alloc at ffffffff81189a30
#11 [ffff880d35b81990] __kmalloc at ffffffff8118a70b
#12 [ffff880d35b819e0] alloc_buffer at ffffffffa0151b66 [dm_bufio]
#13 [ffff880d35b81a20] __bufio_new at ffffffffa0152831 [dm_bufio]
#14 [ffff880d35b81a90] new_read at ffffffffa0152a93 [dm_bufio]
#15 [ffff880d35b81af0] dm_bufio_read at ffffffffa0152b89 [dm_bufio]
#16 [ffff880d35b81b00] dm_bm_read_lock at ffffffffa015cb3d [dm_persistent_data]
#17 [ffff880d35b81b80] dm_tm_read_lock at ffffffffa015f79a [dm_persistent_data]
#18 [ffff880d35b81b90] bn_read_lock at ffffffffa016281b [dm_persistent_data]
#19 [ffff880d35b81ba0] ro_step at ffffffffa0162856 [dm_persistent_data]
#20 [ffff880d35b81bc0] dm_btree_lookup at ffffffffa01610e4 [dm_persistent_data]
#21 [ffff880d35b81c70] dm_thin_find_block at ffffffffa017d20c [dm_thin_pool]
#22 [ffff880d35b81cd0] process_bio at ffffffffa017902d [dm_thin_pool]
#23 [ffff880d35b81da0] do_worker at ffffffffa017a281 [dm_thin_pool]
#24 [ffff880d35b81df0] process_one_work at ffffffff810a61e5
#25 [ffff880d35b81e40] worker_thread at ffffffff810a848a
#26 [ffff880d35b81ec0] kthread at ffffffff810ae48e
#27 [ffff880d35b81f50] ret_from_fork at ffffffff816571c8

So the whole picture to me looks like the following: 

For simplicity I will use the following nomenclature: 
t1 - kworker/u96:0
t2 - kworker/u98:39
t3 - kworker/u98:7

t1 issues drain_all_pages which generates IPI's, at the same time 
however, t2 has already started doing async write of pages 
as part of its normal operation but is blocked upon t1 completion of 
its IPI (generated from drain_all_pages). At the same time again, 
t3 is executing ext4_finish_bio, which disables interrupts, yet is 
dependant on t2 completing.  But since it has disabled interrupts, it
wont respond to t1's IPI and at this a hard lock up happens. This happens, 
since drain_all_pages calls on_each_cpu_mask with the last argument equal to 
"true" meaning "wait until the ipi handler has finished", which of course will never
happen in the described situation. 


Regards, 
Nikolay

On 10/08/2015 02:46 PM, Nikolay Borisov wrote:
> Hello, 
> 
> I've hit a rather strange hard lock up on one of my servers from the 
> page writeback path, the actual backtrace is: 
> 
> [427149.717151] ------------[ cut here ]------------
> [427149.717553] WARNING: CPU: 23 PID: 4611 at kernel/watchdog.c:245 watchdog_overflow_callback+0x98/0xc0()
> [427149.718216] Watchdog detected hard LOCKUP on cpu 23
> [427149.718292] Modules linked in:
> [427149.718723]  tcp_diag inet_diag netconsole act_police cls_basic sch_ingress xt_pkttype xt_state veth openvswitch gre 
> vxlan ip_tunnel xt_owner xt_conntrack iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_n
> at xt_CT nf_conntrack iptable_raw ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ext2 
> dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log i2c_i801 lpc_ich mfd_core shpchp i
> oapic ioatdma igb i2c_algo_bit ses enclosure ipmi_devintf ipmi_si ipmi_msghandler ib_qib dca ib_mad ib_core
> [427149.725321] CPU: 23 PID: 4611 Comm: kworker/u98:7 Not tainted 3.12.47-clouder1 #1
> [427149.725690] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
> [427149.726062] Workqueue: writeback bdi_writeback_workfn (flush-252:148)
> [427149.726564]  00000000000000f5 ffff883fff366b58 ffffffff81651631 00000000000000f5
> [427149.727212]  ffff883fff366ba8 ffff883fff366b98 ffffffff81089a6c 0000000000000000
> [427149.727860]  ffff883fd2f08000 0000000000000000 ffff883fff366ce8 0000000000000000
> [427149.728490] Call Trace:
> [427149.728845]  <NMI>  [<ffffffff81651631>] dump_stack+0x58/0x7f
> [427149.729350]  [<ffffffff81089a6c>] warn_slowpath_common+0x8c/0xc0
> [427149.729712]  [<ffffffff81089b56>] warn_slowpath_fmt+0x46/0x50
> [427149.730076]  [<ffffffff811015f8>] watchdog_overflow_callback+0x98/0xc0
> [427149.730443]  [<ffffffff81132d0c>] __perf_event_overflow+0x9c/0x250
> [427149.730810]  [<ffffffff81133664>] perf_event_overflow+0x14/0x20
> [427149.731175]  [<ffffffff81061796>] intel_pmu_handle_irq+0x1d6/0x3e0
> [427149.739656]  [<ffffffff8105b4c4>] perf_event_nmi_handler+0x34/0x60
> [427149.740027]  [<ffffffff8104c152>] nmi_handle+0xa2/0x1a0
> [427149.740389]  [<ffffffff8104c3b4>] do_nmi+0x164/0x430
> [427149.740754]  [<ffffffff81656e2e>] end_repeat_nmi+0x1a/0x1e
> [427149.741122]  [<ffffffff8113bfd7>] ? mempool_free_slab+0x17/0x20
> [427149.741492]  [<ffffffff8125be15>] ? ext4_finish_bio+0x275/0x2a0
> [427149.741854]  [<ffffffff8125be15>] ? ext4_finish_bio+0x275/0x2a0
> [427149.742216]  [<ffffffff8125be15>] ? ext4_finish_bio+0x275/0x2a0
> [427149.742579]  <<EOE>>  <IRQ>  [<ffffffff8125c2c8>] ext4_end_bio+0xc8/0x120
> [427149.743150]  [<ffffffff811dbf1d>] bio_endio+0x1d/0x40
> [427149.743516]  [<ffffffff81546781>] dec_pending+0x1c1/0x360
> [427149.743878]  [<ffffffff81546996>] clone_endio+0x76/0xa0
> [427149.744239]  [<ffffffff811dbf1d>] bio_endio+0x1d/0x40
> [427149.744599]  [<ffffffff81546781>] dec_pending+0x1c1/0x360
> [427149.744964]  [<ffffffff81546996>] clone_endio+0x76/0xa0
> [427149.745326]  [<ffffffff811dbf1d>] bio_endio+0x1d/0x40
> [427149.745686]  [<ffffffff81546781>] dec_pending+0x1c1/0x360
> [427149.746048]  [<ffffffff81546996>] clone_endio+0x76/0xa0
> [427149.746407]  [<ffffffff811dbf1d>] bio_endio+0x1d/0x40
> [427149.746773]  [<ffffffff812fad2b>] blk_update_request+0x21b/0x450
> [427149.747138]  [<ffffffff812faf87>] blk_update_bidi_request+0x27/0xb0
> [427149.747513]  [<ffffffff812fcc7f>] blk_end_bidi_request+0x2f/0x80
> [427149.748101]  [<ffffffff812fcd20>] blk_end_request+0x10/0x20
> [427149.748705]  [<ffffffff813fdc1c>] scsi_io_completion+0xbc/0x620
> [427149.749297]  [<ffffffff813f57f9>] scsi_finish_command+0xc9/0x130
> [427149.749891]  [<ffffffff813fe2e7>] scsi_softirq_done+0x147/0x170
> [427149.750491]  [<ffffffff813035ad>] blk_done_softirq+0x7d/0x90
> [427149.751089]  [<ffffffff8108ed87>] __do_softirq+0x137/0x2e0
> [427149.751694]  [<ffffffff81658a0c>] call_softirq+0x1c/0x30
> [427149.752284]  [<ffffffff8104a35d>] do_softirq+0x8d/0xc0
> [427149.752892]  [<ffffffff8108e925>] irq_exit+0x95/0xa0
> [427149.753526]  [<ffffffff8106f755>] smp_call_function_single_interrupt+0x35/0x40
> [427149.754149]  [<ffffffff8165826f>] call_function_single_interrupt+0x6f/0x80
> [427149.754750]  <EOI>  [<ffffffff813276e6>] ? memcpy+0x6/0x110
> [427149.755572]  [<ffffffff811dc6d6>] ? __bio_clone+0x26/0x70
> [427149.756179]  [<ffffffff81546db9>] __clone_and_map_data_bio+0x139/0x160
> [427149.756814]  [<ffffffff815471cd>] __split_and_process_bio+0x3ed/0x490
> [427149.757444]  [<ffffffff815473a6>] dm_request+0x136/0x1e0
> [427149.758041]  [<ffffffff812fbe0a>] generic_make_request+0xca/0x100
> [427149.758641]  [<ffffffff812fbeb9>] submit_bio+0x79/0x160
> [427149.759035]  [<ffffffff81144c3d>] ? account_page_writeback+0x2d/0x40
> [427149.759406]  [<ffffffff81144dbd>] ? __test_set_page_writeback+0x16d/0x1f0
> [427149.759781]  [<ffffffff8125b7a9>] ext4_io_submit+0x29/0x50
> [427149.760151]  [<ffffffff8125b8fb>] ext4_bio_write_page+0x12b/0x2f0
> [427149.760519]  [<ffffffff81252fe8>] mpage_submit_page+0x68/0x90
> [427149.760887]  [<ffffffff81253100>] mpage_process_page_bufs+0xf0/0x110
> [427149.761257]  [<ffffffff81254a80>] mpage_prepare_extent_to_map+0x210/0x310
> [427149.761624]  [<ffffffff8125a911>] ? ext4_writepages+0x361/0xc60
> [427149.761998]  [<ffffffff81283c09>] ? __ext4_journal_start_sb+0x79/0x110
> [427149.762362]  [<ffffffff8125a948>] ext4_writepages+0x398/0xc60
> [427149.762726]  [<ffffffff812fd358>] ? blk_finish_plug+0x18/0x50
> [427149.763089]  [<ffffffff81146b40>] do_writepages+0x20/0x40
> [427149.763454]  [<ffffffff811cec79>] __writeback_single_inode+0x49/0x2b0
> [427149.763819]  [<ffffffff810aeeef>] ? wake_up_bit+0x2f/0x40
> [427149.764182]  [<ffffffff811cfdee>] writeback_sb_inodes+0x2de/0x540
> [427149.764544]  [<ffffffff811a6e65>] ? put_super+0x25/0x50
> [427149.764903]  [<ffffffff811d00ee>] __writeback_inodes_wb+0x9e/0xd0
> [427149.765265]  [<ffffffff811d035b>] wb_writeback+0x23b/0x340
> [427149.765628]  [<ffffffff811d04f9>] wb_do_writeback+0x99/0x230
> [427149.765996]  [<ffffffff810a40f1>] ? set_worker_desc+0x81/0x90
> [427149.766364]  [<ffffffff810c7a6a>] ? dequeue_task_fair+0x36a/0x4c0
> [427149.766734]  [<ffffffff811d0bf8>] bdi_writeback_workfn+0x88/0x260
> [427149.767101]  [<ffffffff810bbb3e>] ? finish_task_switch+0x4e/0xe0
> [427149.767473]  [<ffffffff81653dac>] ? __schedule+0x2dc/0x760
> [427149.767845]  [<ffffffff810a61e5>] process_one_work+0x195/0x550
> [427149.768212]  [<ffffffff810a848a>] worker_thread+0x13a/0x430
> [427149.768575]  [<ffffffff810a8350>] ? manage_workers+0x2c0/0x2c0
> [427149.768940]  [<ffffffff810ae48e>] kthread+0xce/0xe0
> [427149.769303]  [<ffffffff810ae3c0>] ? kthread_freezable_should_stop+0x80/0x80
> [427149.769668]  [<ffffffff816571c8>] ret_from_fork+0x58/0x90
> [427149.770030]  [<ffffffff810ae3c0>] ? kthread_freezable_should_stop+0x80/0x80
> [427149.770394] ---[ end trace 76566eb23cf1b028 ]---
> 
> 
> What's not evident here is that in ext4_end_bio, the CPU has actually 
> been executing ext4_finish_bio as seen from the following 
> NMI backtrace: 
> 
> [427160.405277] NMI backtrace for cpu 23
> [427160.405279] CPU: 23 PID: 4611 Comm: kworker/u98:7 Tainted: G        W    3.12.47-clouder1 #1
> [427160.405281] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
> [427160.405285] Workqueue: writeback bdi_writeback_workfn (flush-252:148)
> [427160.405286] task: ffff8825aa819830 ti: ffff882b19180000 task.ti: ffff882b19180000
> [427160.405290] RIP: 0010:[<ffffffff8125be13>]  [<ffffffff8125be13>] ext4_finish_bio+0x273/0x2a0 <-- Important bit
> [427160.405291] RSP: 0000:ffff883fff3639b0  EFLAGS: 00000002
> [427160.405292] RAX: ffff882b19180000 RBX: ffff883f67480a80 RCX: 0000000000000110
> [427160.405292] RDX: ffff882b19180000 RSI: 0000000000000000 RDI: ffff883f67480a80
> [427160.405293] RBP: ffff883fff363a70 R08: 0000000000014b80 R09: ffff881fff454f00
> [427160.405294] R10: ffffea00473214c0 R11: ffffffff8113bfd7 R12: ffff880826272138
> [427160.405295] R13: 0000000000000000 R14: 0000000000000000 R15: ffffea00aeaea400
> [427160.405296] FS:  0000000000000000(0000) GS:ffff883fff360000(0000) knlGS:0000000000000000
> [427160.405296] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [427160.405297] CR2: 0000003c5b009c24 CR3: 0000000001c0b000 CR4: 00000000001407e0
> [427160.405297] Stack:
> [427160.405305]  0000000000000000 ffffffff8203f230 ffff883fff363a00 ffff882b19180000
> [427160.405312]  ffff882b19180000 ffff882b19180000 00000400018e0af8 ffff882b19180000
> [427160.405319]  ffff883f67480a80 0000000000000000 0000000000000202 00000000d219e720
> [427160.405320] Call Trace:
> [427160.405324]  <IRQ> 
> 	//SAME AS THE FIRST STACKTRACE
> [427160.405436]  <EOI> 
> 	//SAME AS THE FIRST STACKTRACE
> [427160.405540] Code: e8 a3 38 ec ff ff 75 90 9d e9 2a ff ff ff 0f 0b 0f 1f 84 00 00 00 00 00 eb f6 48 8b 85 60 ff ff ff 48 8b 95 58 ff ff ff ff 48 1c <f3> 90 49 8b 0c 24 80 e1 10 75 f5 ff 42 1c f0 41 0f ba 2c 24 04 
> 
> After sampling from several different NMI prints it turns out the CPU 
> has looped on the following bit lock acquire: 
> 
> local_irq_save(flags);
> bit_spin_lock(BH_Uptodate_Lock, &head->b_state);
> do {
>      if (bh_offset(bh) < bio_start ||
>          bh_offset(bh) + bh->b_size > bio_end) {
>          	if (buffer_async_write(bh))
>                 	under_io++;
>                 continue;
>      }
>      clear_buffer_async_write(bh);
>      if (error)
>      	buffer_io_error(bh);
> } while ((bh = bh->b_this_page) != head);
> bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
> local_irq_restore(flags);
> 
> So far everything seems to check out - the code is spinning on 
> acquiring the BH_Uptodate_Lock with interrupts disabled and 
> this causes the deadlock. Now, the pertinent question is 
> why can't it acquire the spinlock. Grepping for the users of
> BH_Uptodate_Lock it is revealed that the only other places where 
> they are used are in end_buffer_async_read and 
> end_buffer_async_write. Looking at the NMI backtraces for the 
> rest of the CPUs on the system I didn't find it in any 
> execution path. Most of the cores were in acpi_idle_enter and 
> one was in mem cgroup reclaim path. However I think those are 
> unrelated. 
> 
> Ignoring the NMI call stack, this is by far the deepest callchain
> I've ever seen, I wonder whether some sort of memory corruption has 
> happened which confuses the code into thinking the lock is 
> held. Here is the struct buffer_head:
> 
> crash> struct buffer_head ffff880826272138 -x
> 
> struct buffer_head {
>   b_state = 0x131,  
>   b_this_page = 0xffff8808262728f0, 
>   b_page = 0xffffea00aeaea400, 
>   b_blocknr = 0x36bea69, 
>   b_size = 0x400, 
>   b_data = 0xffff882baba90000 struct: page excluded: kernel virtual address: ffff882baba90000  type: "gdb_readmem_callback"
> struct: page excluded: kernel virtual address: ffff882baba90000  type: "gdb_readmem_callback"
> <Address 0xffff882baba90000 out of bounds>, 
>   b_bdev = 0xffff883fcfb6db00, 
>   b_end_io = 0x0, 
>   b_private = 0x0, 
>   b_assoc_buffers = {
>     next = 0xffff880826272180, 
>     prev = 0xffff880826272180
>   }, 
>   b_assoc_map = 0x0, 
>   b_count = {
>     counter = 0x0
>   }
> }
> 
> b_state = 0x131 = (0001 0011 0001) = BH_Uptodate, BG_Uptodate_Lock, 
> BH_Mapped and BH_Async_Write, which implies that 
> end_buffer_async_write should have been executing somewhere but 
> I do not see this.  And in any case it hasn't completed in reasonable time
> since we've hit the deadlock. At this point I'm inclined to believe 
> that some sort of hardware issue might have reared its head?
> 
> What do you guys think? 
> 
> Regards, 
> Nikolay
> 
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hard lockup in ext4_finish_bio
  2015-10-08 11:46 Hard lockup in ext4_finish_bio Nikolay Borisov
  2015-10-08 14:15 ` Nikolay Borisov
@ 2015-10-08 14:34 ` John Stoffel
  2015-10-08 14:37   ` Nikolay Borisov
  1 sibling, 1 reply; 6+ messages in thread
From: John Stoffel @ 2015-10-08 14:34 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: Linux-Kernel@Vger. Kernel. Org, linux-ext4, Theodore Ts'o,
	Marian Marinov, Jan Kara


Great bug report, but you're missing the info on which kernel you're
running here...  is this a vendor kernel or self-compiled?  


Nikolay> I've hit a rather strange hard lock up on one of my servers
Nikolay> from the page writeback path, the actual backtrace is:

Nikolay> [427149.717151] ------------[ cut here ]------------
Nikolay> [427149.717553] WARNING: CPU: 23 PID: 4611 at
Nikolay> kernel/watchdog.c:245 watchdog_overflow_callback+0x98/0xc0()
Nikolay> [427149.718216] Watchdog detected hard LOCKUP on cpu 23
Nikolay> [427149.718292] Modules linked in: [427149.718723] tcp_diag
Nikolay> inet_diag netconsole act_police cls_basic sch_ingress
Nikolay> xt_pkttype xt_state veth openvswitch gre vxlan ip_tunnel
Nikolay> xt_owner xt_conntrack iptable_mangle xt_nat iptable_nat
Nikolay> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_n at xt_CT
Nikolay> nf_conntrack iptable_raw ib_ipoib rdma_ucm ib_ucm ib_uverbs
Nikolay> ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ext2
Nikolay> dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio
Nikolay> dm_mirror dm_region_hash dm_log i2c_i801 lpc_ich mfd_core
Nikolay> shpchp i oapic ioatdma igb i2c_algo_bit ses enclosure
Nikolay> ipmi_devintf ipmi_si ipmi_msghandler ib_qib dca ib_mad
Nikolay> ib_core [427149.725321] CPU: 23 PID: 4611 Comm: kworker/u98:7
Nikolay> Not tainted 3.12.47-clouder1 #1 [427149.725690] Hardware
Nikolay> name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
Nikolay> [427149.726062] Workqueue: writeback bdi_writeback_workfn
Nikolay> (flush-252:148) [427149.726564] 00000000000000f5
Nikolay> ffff883fff366b58 ffffffff81651631 00000000000000f5
Nikolay> [427149.727212] ffff883fff366ba8 ffff883fff366b98
Nikolay> ffffffff81089a6c 0000000000000000 [427149.727860]
Nikolay> ffff883fd2f08000 0000000000000000 ffff883fff366ce8
Nikolay> 0000000000000000 [427149.728490] Call Trace: [427149.728845]
Nikolay> <NMI> [<ffffffff81651631>] dump_stack+0x58/0x7f
Nikolay> [427149.729350] [<ffffffff81089a6c>]
Nikolay> warn_slowpath_common+0x8c/0xc0 [427149.729712]
Nikolay> [<ffffffff81089b56>] warn_slowpath_fmt+0x46/0x50
Nikolay> [427149.730076] [<ffffffff811015f8>]
Nikolay> watchdog_overflow_callback+0x98/0xc0 [427149.730443]
Nikolay> [<ffffffff81132d0c>] __perf_event_overflow+0x9c/0x250
Nikolay> [427149.730810] [<ffffffff81133664>]
Nikolay> perf_event_overflow+0x14/0x20 [427149.731175]
Nikolay> [<ffffffff81061796>] intel_pmu_handle_irq+0x1d6/0x3e0
Nikolay> [427149.739656] [<ffffffff8105b4c4>]
Nikolay> perf_event_nmi_handler+0x34/0x60 [427149.740027]
Nikolay> [<ffffffff8104c152>] nmi_handle+0xa2/0x1a0 [427149.740389]
Nikolay> [<ffffffff8104c3b4>] do_nmi+0x164/0x430 [427149.740754]
Nikolay> [<ffffffff81656e2e>] end_repeat_nmi+0x1a/0x1e [427149.741122]
Nikolay> [<ffffffff8113bfd7>] ? mempool_free_slab+0x17/0x20
Nikolay> [427149.741492] [<ffffffff8125be15>] ?
Nikolay> ext4_finish_bio+0x275/0x2a0 [427149.741854]
Nikolay> [<ffffffff8125be15>] ? ext4_finish_bio+0x275/0x2a0
Nikolay> [427149.742216] [<ffffffff8125be15>] ?
Nikolay> ext4_finish_bio+0x275/0x2a0 [427149.742579] <<EOE>> <IRQ>
Nikolay> [<ffffffff8125c2c8>] ext4_end_bio+0xc8/0x120 [427149.743150]
Nikolay> [<ffffffff811dbf1d>] bio_endio+0x1d/0x40 [427149.743516]
Nikolay> [<ffffffff81546781>] dec_pending+0x1c1/0x360 [427149.743878]
Nikolay> [<ffffffff81546996>] clone_endio+0x76/0xa0 [427149.744239]
Nikolay> [<ffffffff811dbf1d>] bio_endio+0x1d/0x40 [427149.744599]
Nikolay> [<ffffffff81546781>] dec_pending+0x1c1/0x360 [427149.744964]
Nikolay> [<ffffffff81546996>] clone_endio+0x76/0xa0 [427149.745326]
Nikolay> [<ffffffff811dbf1d>] bio_endio+0x1d/0x40 [427149.745686]
Nikolay> [<ffffffff81546781>] dec_pending+0x1c1/0x360 [427149.746048]
Nikolay> [<ffffffff81546996>] clone_endio+0x76/0xa0 [427149.746407]
Nikolay> [<ffffffff811dbf1d>] bio_endio+0x1d/0x40 [427149.746773]
Nikolay> [<ffffffff812fad2b>] blk_update_request+0x21b/0x450
Nikolay> [427149.747138] [<ffffffff812faf87>]
Nikolay> blk_update_bidi_request+0x27/0xb0 [427149.747513]
Nikolay> [<ffffffff812fcc7f>] blk_end_bidi_request+0x2f/0x80
Nikolay> [427149.748101] [<ffffffff812fcd20>]
Nikolay> blk_end_request+0x10/0x20 [427149.748705]
Nikolay> [<ffffffff813fdc1c>] scsi_io_completion+0xbc/0x620
Nikolay> [427149.749297] [<ffffffff813f57f9>]
Nikolay> scsi_finish_command+0xc9/0x130 [427149.749891]
Nikolay> [<ffffffff813fe2e7>] scsi_softirq_done+0x147/0x170
Nikolay> [427149.750491] [<ffffffff813035ad>]
Nikolay> blk_done_softirq+0x7d/0x90 [427149.751089]
Nikolay> [<ffffffff8108ed87>] __do_softirq+0x137/0x2e0 [427149.751694]
Nikolay> [<ffffffff81658a0c>] call_softirq+0x1c/0x30 [427149.752284]
Nikolay> [<ffffffff8104a35d>] do_softirq+0x8d/0xc0 [427149.752892]
Nikolay> [<ffffffff8108e925>] irq_exit+0x95/0xa0 [427149.753526]
Nikolay> [<ffffffff8106f755>]
Nikolay> smp_call_function_single_interrupt+0x35/0x40 [427149.754149]
Nikolay> [<ffffffff8165826f>] call_function_single_interrupt+0x6f/0x80
Nikolay> [427149.754750] <EOI> [<ffffffff813276e6>] ? memcpy+0x6/0x110
Nikolay> [427149.755572] [<ffffffff811dc6d6>] ? __bio_clone+0x26/0x70
Nikolay> [427149.756179] [<ffffffff81546db9>]
Nikolay> __clone_and_map_data_bio+0x139/0x160 [427149.756814]
Nikolay> [<ffffffff815471cd>] __split_and_process_bio+0x3ed/0x490
Nikolay> [427149.757444] [<ffffffff815473a6>] dm_request+0x136/0x1e0
Nikolay> [427149.758041] [<ffffffff812fbe0a>]
Nikolay> generic_make_request+0xca/0x100 [427149.758641]
Nikolay> [<ffffffff812fbeb9>] submit_bio+0x79/0x160 [427149.759035]
Nikolay> [<ffffffff81144c3d>] ? account_page_writeback+0x2d/0x40
Nikolay> [427149.759406] [<ffffffff81144dbd>] ?
Nikolay> __test_set_page_writeback+0x16d/0x1f0 [427149.759781]
Nikolay> [<ffffffff8125b7a9>] ext4_io_submit+0x29/0x50 [427149.760151]
Nikolay> [<ffffffff8125b8fb>] ext4_bio_write_page+0x12b/0x2f0
Nikolay> [427149.760519] [<ffffffff81252fe8>]
Nikolay> mpage_submit_page+0x68/0x90 [427149.760887]
Nikolay> [<ffffffff81253100>] mpage_process_page_bufs+0xf0/0x110
Nikolay> [427149.761257] [<ffffffff81254a80>]
Nikolay> mpage_prepare_extent_to_map+0x210/0x310 [427149.761624]
Nikolay> [<ffffffff8125a911>] ? ext4_writepages+0x361/0xc60
Nikolay> [427149.761998] [<ffffffff81283c09>] ?
Nikolay> __ext4_journal_start_sb+0x79/0x110 [427149.762362]
Nikolay> [<ffffffff8125a948>] ext4_writepages+0x398/0xc60
Nikolay> [427149.762726] [<ffffffff812fd358>] ?
Nikolay> blk_finish_plug+0x18/0x50 [427149.763089]
Nikolay> [<ffffffff81146b40>] do_writepages+0x20/0x40 [427149.763454]
Nikolay> [<ffffffff811cec79>] __writeback_single_inode+0x49/0x2b0
Nikolay> [427149.763819] [<ffffffff810aeeef>] ? wake_up_bit+0x2f/0x40
Nikolay> [427149.764182] [<ffffffff811cfdee>]
Nikolay> writeback_sb_inodes+0x2de/0x540 [427149.764544]
Nikolay> [<ffffffff811a6e65>] ? put_super+0x25/0x50 [427149.764903]
Nikolay> [<ffffffff811d00ee>] __writeback_inodes_wb+0x9e/0xd0
Nikolay> [427149.765265] [<ffffffff811d035b>] wb_writeback+0x23b/0x340
Nikolay> [427149.765628] [<ffffffff811d04f9>]
Nikolay> wb_do_writeback+0x99/0x230 [427149.765996]
Nikolay> [<ffffffff810a40f1>] ? set_worker_desc+0x81/0x90
Nikolay> [427149.766364] [<ffffffff810c7a6a>] ?
Nikolay> dequeue_task_fair+0x36a/0x4c0 [427149.766734]
Nikolay> [<ffffffff811d0bf8>] bdi_writeback_workfn+0x88/0x260
Nikolay> [427149.767101] [<ffffffff810bbb3e>] ?
Nikolay> finish_task_switch+0x4e/0xe0 [427149.767473]
Nikolay> [<ffffffff81653dac>] ? __schedule+0x2dc/0x760 [427149.767845]
Nikolay> [<ffffffff810a61e5>] process_one_work+0x195/0x550
Nikolay> [427149.768212] [<ffffffff810a848a>]
Nikolay> worker_thread+0x13a/0x430 [427149.768575]
Nikolay> [<ffffffff810a8350>] ? manage_workers+0x2c0/0x2c0
Nikolay> [427149.768940] [<ffffffff810ae48e>] kthread+0xce/0xe0
Nikolay> [427149.769303] [<ffffffff810ae3c0>] ?
Nikolay> kthread_freezable_should_stop+0x80/0x80 [427149.769668]
Nikolay> [<ffffffff816571c8>] ret_from_fork+0x58/0x90 [427149.770030]
Nikolay> [<ffffffff810ae3c0>] ?
Nikolay> kthread_freezable_should_stop+0x80/0x80 [427149.770394] ---[
Nikolay> end trace 76566eb23cf1b028 ]---


Nikolay> What's not evident here is that in ext4_end_bio, the CPU has
Nikolay> actually been executing ext4_finish_bio as seen from the
Nikolay> following NMI backtrace:

Nikolay> [427160.405277] NMI backtrace for cpu 23 [427160.405279] CPU:
Nikolay> 23 PID: 4611 Comm: kworker/u98:7 Tainted: G W
Nikolay> 3.12.47-clouder1 #1 [427160.405281] Hardware name: Supermicro
Nikolay> X10DRi/X10DRi, BIOS 1.1 04/14/2015 [427160.405285] Workqueue:
Nikolay> writeback bdi_writeback_workfn (flush-252:148)
Nikolay> [427160.405286] task: ffff8825aa819830 ti: ffff882b19180000
Nikolay> task.ti: ffff882b19180000 [427160.405290] RIP:
Nikolay> 0010:[<ffffffff8125be13>] [<ffffffff8125be13>]
Nikolay> ext4_finish_bio+0x273/0x2a0 <-- Important bit [427160.405291]
Nikolay> RSP: 0000:ffff883fff3639b0 EFLAGS: 00000002 [427160.405292]
Nikolay> RAX: ffff882b19180000 RBX: ffff883f67480a80 RCX:
Nikolay> 0000000000000110 [427160.405292] RDX: ffff882b19180000 RSI:
Nikolay> 0000000000000000 RDI: ffff883f67480a80 [427160.405293] RBP:
Nikolay> ffff883fff363a70 R08: 0000000000014b80 R09: ffff881fff454f00
Nikolay> [427160.405294] R10: ffffea00473214c0 R11: ffffffff8113bfd7
Nikolay> R12: ffff880826272138 [427160.405295] R13: 0000000000000000
Nikolay> R14: 0000000000000000 R15: ffffea00aeaea400 [427160.405296]
Nikolay> FS: 0000000000000000(0000) GS:ffff883fff360000(0000)
Nikolay> knlGS:0000000000000000 [427160.405296] CS: 0010 DS: 0000 ES:
Nikolay> 0000 CR0: 0000000080050033 [427160.405297] CR2:
Nikolay> 0000003c5b009c24 CR3: 0000000001c0b000 CR4: 00000000001407e0
Nikolay> [427160.405297] Stack: [427160.405305] 0000000000000000
Nikolay> ffffffff8203f230 ffff883fff363a00 ffff882b19180000
Nikolay> [427160.405312] ffff882b19180000 ffff882b19180000
Nikolay> 00000400018e0af8 ffff882b19180000 [427160.405319]
Nikolay> ffff883f67480a80 0000000000000000 0000000000000202
Nikolay> 00000000d219e720 [427160.405320] Call Trace: [427160.405324]
Nikolay> <IRQ> //SAME AS THE FIRST STACKTRACE [427160.405436] <EOI>
Nikolay> //SAME AS THE FIRST STACKTRACE [427160.405540] Code: e8 a3 38
Nikolay> ec ff ff 75 90 9d e9 2a ff ff ff 0f 0b 0f 1f 84 00 00 00 00
Nikolay> 00 eb f6 48 8b 85 60 ff ff ff 48 8b 95 58 ff ff ff ff 48 1c
Nikolay> <f3> 90 49 8b 0c 24 80 e1 10 75 f5 ff 42 1c f0 41 0f ba 2c 24
Nikolay> 04

Nikolay> After sampling from several different NMI prints it turns out
Nikolay> the CPU has looped on the following bit lock acquire:

Nikolay> local_irq_save(flags); bit_spin_lock(BH_Uptodate_Lock,
Nikolay> &head->b_state); do { if (bh_offset(bh) < bio_start ||
Nikolay> bh_offset(bh) + bh->b_size > bio_end) { if
Nikolay> (buffer_async_write(bh)) under_io++; continue; }
Nikolay> clear_buffer_async_write(bh); if (error) buffer_io_error(bh);
Nikolay> } while ((bh = bh->b_this_page) != head);
Nikolay> bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
Nikolay> local_irq_restore(flags);

Nikolay> So far everything seems to check out - the code is spinning
Nikolay> on acquiring the BH_Uptodate_Lock with interrupts disabled
Nikolay> and this causes the deadlock. Now, the pertinent question is
Nikolay> why can't it acquire the spinlock. Grepping for the users of
Nikolay> BH_Uptodate_Lock it is revealed that the only other places
Nikolay> where they are used are in end_buffer_async_read and
Nikolay> end_buffer_async_write. Looking at the NMI backtraces for the
Nikolay> rest of the CPUs on the system I didn't find it in any
Nikolay> execution path. Most of the cores were in acpi_idle_enter and
Nikolay> one was in mem cgroup reclaim path. However I think those are
Nikolay> unrelated.

Nikolay> Ignoring the NMI call stack, this is by far the deepest
Nikolay> callchain I've ever seen, I wonder whether some sort of
Nikolay> memory corruption has happened which confuses the code into
Nikolay> thinking the lock is held. Here is the struct buffer_head:

crash> struct buffer_head ffff880826272138 -x

Nikolay> struct buffer_head { b_state = 0x131, b_this_page =
Nikolay> 0xffff8808262728f0, b_page = 0xffffea00aeaea400, b_blocknr =
Nikolay> 0x36bea69, b_size = 0x400, b_data = 0xffff882baba90000
Nikolay> struct: page excluded: kernel virtual address:
Nikolay> ffff882baba90000 type: "gdb_readmem_callback" struct: page
Nikolay> excluded: kernel virtual address: ffff882baba90000 type:
Nikolay> "gdb_readmem_callback" <Address 0xffff882baba90000 out of
Nikolay> bounds>, b_bdev = 0xffff883fcfb6db00, b_end_io = 0x0,
Nikolay> b_private = 0x0, b_assoc_buffers = { next =
Nikolay> 0xffff880826272180, prev = 0xffff880826272180 }, b_assoc_map
Nikolay> = 0x0, b_count = { counter = 0x0 } }

Nikolay> b_state = 0x131 = (0001 0011 0001) = BH_Uptodate,
Nikolay> BG_Uptodate_Lock, BH_Mapped and BH_Async_Write, which implies
Nikolay> that end_buffer_async_write should have been executing
Nikolay> somewhere but I do not see this.  And in any case it hasn't
Nikolay> completed in reasonable time since we've hit the deadlock. At
Nikolay> this point I'm inclined to believe that some sort of hardware
Nikolay> issue might have reared its head?

Nikolay> What do you guys think?

Nikolay> Regards, Nikolay






Nikolay> -- To unsubscribe from this list: send the line "unsubscribe
Nikolay> linux-kernel" in the body of a message to
Nikolay> majordomo@vger.kernel.org More majordomo info at
Nikolay> http://vger.kernel.org/majordomo-info.html Please read the
Nikolay> FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hard lockup in ext4_finish_bio
  2015-10-08 14:34 ` John Stoffel
@ 2015-10-08 14:37   ` Nikolay Borisov
  2015-10-08 18:56     ` John Stoffel
  0 siblings, 1 reply; 6+ messages in thread
From: Nikolay Borisov @ 2015-10-08 14:37 UTC (permalink / raw)
  To: John Stoffel
  Cc: Linux-Kernel@Vger. Kernel. Org, linux-ext4, Theodore Ts'o,
	Marian Marinov, Jan Kara



On 10/08/2015 05:34 PM, John Stoffel wrote:
> Great bug report, but you're missing the info on which kernel you're

This is on 3.12.47 (self compiled). It was evident on my initial post,
but I did forget to mention that in the reply. Also, I suspect even
current kernel are susceptible to this since the locking in question
hasn't changed.

Regards,
Nikolay

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hard lockup in ext4_finish_bio
  2015-10-08 14:37   ` Nikolay Borisov
@ 2015-10-08 18:56     ` John Stoffel
  2015-10-09  8:05       ` Nikolay Borisov
  0 siblings, 1 reply; 6+ messages in thread
From: John Stoffel @ 2015-10-08 18:56 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: John Stoffel, Linux-Kernel@Vger. Kernel. Org, linux-ext4,
	Theodore Ts'o, Marian Marinov, Jan Kara

>>>>> "Nikolay" == Nikolay Borisov <kernel@kyup.com> writes:

Nikolay> On 10/08/2015 05:34 PM, John Stoffel wrote:
>> Great bug report, but you're missing the info on which kernel
>> you're

Nikolay> This is on 3.12.47 (self compiled). It was evident on my
Nikolay> initial post, but I did forget to mention that in the
Nikolay> reply. Also, I suspect even current kernel are susceptible to
Nikolay> this since the locking in question hasn't changed.

Hi Nikolay, must have missed it.  I looked quickly, but didn't find
it.  Since it's such an older kernel release, it might be best if you
upgrade to the latest version and try to re-create the lock if at all
possible.  

What kind of workload are you running on there?  And if you have more
details on the hardware, that might help as well.  

John

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hard lockup in ext4_finish_bio
  2015-10-08 18:56     ` John Stoffel
@ 2015-10-09  8:05       ` Nikolay Borisov
  0 siblings, 0 replies; 6+ messages in thread
From: Nikolay Borisov @ 2015-10-09  8:05 UTC (permalink / raw)
  To: John Stoffel
  Cc: Linux-Kernel@Vger. Kernel. Org, linux-ext4, Theodore Ts'o,
	Marian Marinov, Jan Kara



On 10/08/2015 09:56 PM, John Stoffel wrote:
>>>>>> "Nikolay" == Nikolay Borisov <kernel@kyup.com> writes:
> 
> Nikolay> On 10/08/2015 05:34 PM, John Stoffel wrote:
>>> Great bug report, but you're missing the info on which kernel
>>> you're
> 
> Nikolay> This is on 3.12.47 (self compiled). It was evident on my
> Nikolay> initial post, but I did forget to mention that in the
> Nikolay> reply. Also, I suspect even current kernel are susceptible to
> Nikolay> this since the locking in question hasn't changed.
> 
> Hi Nikolay, must have missed it.  I looked quickly, but didn't find
> it.  Since it's such an older kernel release, it might be best if you
> upgrade to the latest version and try to re-create the lock if at all
> possible.  

The relevant code hasn't changed. I've already posted RFC patches. You
can find the thread here: http://thread.gmane.org/gmane.linux.kernel/2056996

> 
> What kind of workload are you running on there?  And if you have more
> details on the hardware, that might help as well.  
> 
> John
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-10-09  8:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-08 11:46 Hard lockup in ext4_finish_bio Nikolay Borisov
2015-10-08 14:15 ` Nikolay Borisov
2015-10-08 14:34 ` John Stoffel
2015-10-08 14:37   ` Nikolay Borisov
2015-10-08 18:56     ` John Stoffel
2015-10-09  8:05       ` Nikolay Borisov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).