* Kernel crash during "btrfs device delete" on raid6 volume
@ 2014-11-04 14:36 Erik Berg
2014-11-04 14:55 ` Chris Mason
0 siblings, 1 reply; 4+ messages in thread
From: Erik Berg @ 2014-11-04 14:36 UTC (permalink / raw)
To: linux-btrfs
Pulled the latest btrfs-progs from kdave (v3.17-12-gcafacda) and using
the latest linux release candidate (3.18.0-031800rc3-generic) from
canonical/ubuntu
btrfs fi show
Label: none uuid: 5c5fea06-0319-4e03-a42e-004e64aeed92
Total devices 9 FS bytes used 10.91TiB
devid 2 size 931.48GiB used 928.02GiB path /dev/sdc1
devid 3 size 931.48GiB used 928.02GiB path /dev/sdd1
devid 4 size 1.82TiB used 1.67TiB path /dev/sde1
devid 5 size 2.73TiB used 2.28TiB path /dev/sdf1
devid 6 size 3.64TiB used 2.73TiB path /dev/sdg1
devid 7 size 3.64TiB used 2.73TiB path /dev/sdh1
devid 8 size 931.46GiB used 655.90GiB path /dev/sdb1
devid 9 size 3.64TiB used 2.73TiB path /dev/sdi1
devid 10 size 3.64TiB used 1.79TiB path /dev/sdj1
btrfs fi df
Data, RAID6: total=10.91TiB, used=10.90TiB
System, RAID6: total=96.00MiB, used=800.00KiB
Metadata, RAID6: total=13.23GiB, used=11.79GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
Trying to remove device sdb1, the kernel crashes after a minute or so.
[ 597.576827] ------------[ cut here ]------------
[ 597.617519] kernel BUG at /home/apw/COD/linux/mm/slub.c:3334!
[ 597.668145] invalid opcode: 0000 [#1] SMP
[ 597.704410] Modules linked in: arc4 md4 ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4
xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter
ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables
gpio_ich intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd
serio_raw hpilo hpwdt 8250_fintek acpi_power_meter ie31200_edac lpc_ich
edac_core ipmi_si ipmi_msghandler mac_hid lp parport nls_utf8 cifs
fscache hid_generic usbhid hid btrfs xor raid6_pq uas usb_storage tg3
ptp ahci psmouse libahci pps_core hpsa
[ 598.268179] CPU: 1 PID: 129 Comm: kworker/u128:3 Not tainted
3.18.0-031800rc3-generic #201411022335
[ 598.349925] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06
11/09/2013
[ 598.413231] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-2)
[ 598.471103] task: ffff8803f16a3c00 ti: ffff880036b70000 task.ti:
ffff880036b70000
[ 598.538393] RIP: 0010:[<ffffffff811c74fd>] [<ffffffff811c74fd>]
kfree+0x16d/0x170
[ 598.606217] RSP: 0018:ffff880036b73528 EFLAGS: 00010246
[ 598.653844] RAX: 01ffff0000000000 RBX: ffff880036b735c8 RCX:
0000000000000000
[ 598.717899] RDX: ffff8803743a6010 RSI: dead000000100100 RDI:
ffff880036b735c8
[ 598.781662] RBP: ffff880036b73558 R08: 0000000000000000 R09:
ffffea0000dadcc0
[ 598.846028] R10: 0000000000000001 R11: 0000000000000010 R12:
ffff8803f1e09800
[ 598.910713] R13: ffff8803ac757d40 R14: ffffffffc04fed0c R15:
ffff880036b735d8
[ 598.975333] FS: 0000000000000000(0000) GS:ffff88040b420000(0000)
knlGS:0000000000000000
[ 599.048512] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 599.100167] CR2: 00007fa9a3854024 CR3: 0000000001c16000 CR4:
00000000001407e0
[ 599.165150] Stack:
[ 599.183305] ffff8803f1e09800 00000dad07c20000 ffff8803f1e09800
ffff8803ac757d40
[ 599.249603] ffff8803ac757d40 ffff880036b735d8 ffff880036b73618
ffffffffc04fed0c
[ 599.316306] ffff8803f1b86b00 ffff880374338000 00000dad07dc0000
ffff880036b73638
[ 599.383404] Call Trace:
[ 599.405429] [<ffffffffc04fed0c>]
btrfs_lookup_csums_range+0x2ac/0x4a0 [btrfs]
[ 599.470388] [<ffffffffc05251a3>] ?
__btrfs_add_ordered_extent+0x43/0x3c0 [btrfs]
[ 599.537826] [<ffffffffc0560047>] btrfs_reloc_clone_csums+0x77/0xe0
[btrfs]
[ 599.600291] [<ffffffffc051435f>] run_delalloc_nocow+0x62f/0xae0 [btrfs]
[ 599.660798] [<ffffffffc051499e>] run_delalloc_range+0x18e/0x1b0 [btrfs]
[ 599.720774] [<ffffffffc052a4b4>]
writepage_delalloc.isra.32+0xf4/0x170 [btrfs]
[ 599.786169] [<ffffffffc052cb1f>] __extent_writepage+0xcf/0x280 [btrfs]
[ 599.845626] [<ffffffff811a7e20>] ? SyS_msync+0x230/0x230
[ 599.894443] [<ffffffffc052cf8a>]
extent_write_cache_pages.isra.25.constprop.38+0x2ba/0x420 [btrfs]
[ 599.975893] [<ffffffffc052d5fe>] extent_writepages+0x4e/0x70 [btrfs]
[ 600.033786] [<ffffffffc0511b20>] ? btrfs_submit_direct+0x1b0/0x1b0
[btrfs]
[ 600.096916] [<ffffffffc050df08>] btrfs_writepages+0x28/0x30 [btrfs]
[ 600.153782] [<ffffffff8117ae80>] do_writepages+0x20/0x40
[ 600.202558] [<ffffffff8120e635>] __writeback_single_inode+0x45/0x1c0
[ 600.260557] [<ffffffff8121000e>] writeback_sb_inodes+0x22e/0x340
[ 600.314444] [<ffffffff812101be>] __writeback_inodes_wb+0x9e/0xd0
[ 600.369057] [<ffffffff8121047b>] wb_writeback+0x28b/0x330
[ 600.418704] [<ffffffff81201282>] ? get_nr_dirty_inodes+0x52/0x80
[ 600.473413] [<ffffffff812105bf>] wb_check_old_data_flush+0x9f/0xb0
[ 600.530122] [<ffffffff81210704>] wb_do_writeback+0x134/0x1c0
[ 600.580781] [<ffffffff8108a8af>] ? set_worker_desc+0x6f/0x80
[ 600.632015] [<ffffffff81212698>] bdi_writeback_workfn+0x78/0x1f0
[ 600.686073] [<ffffffff810874fe>] process_one_work+0x14e/0x460
[ 600.738144] [<ffffffff81087e7b>] worker_thread+0x11b/0x3f0
[ 600.787525] [<ffffffff81087d60>] ? create_worker+0x1e0/0x1e0
[ 600.838305] [<ffffffff8108d9f9>] kthread+0xc9/0xe0
[ 600.882010] [<ffffffff8108d930>] ? flush_kthread_worker+0x90/0x90
[ 600.936705] [<ffffffff817a8f3c>] ret_from_fork+0x7c/0xb0
[ 600.984246] [<ffffffff8108d930>] ? flush_kthread_worker+0x90/0x90
[ 601.038920] Code: 31 f6 f6 c4 40 74 04 41 8b 71 68 4c 89 cf e8 2b ee
fa ff e9 6a ff ff ff 49 8b 41 30 49 8b 11 80 e6 80 4c 0f 45 c8 e9 09 ff
ff ff <0f> 0b 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 f6 41 55
[ 601.211081] RIP [<ffffffff811c74fd>] kfree+0x16d/0x170
[ 601.258198] RSP <ffff880036b73528>
[ 601.292012] ---[ end trace 4760080785caca88 ]---
[ 601.340046] BUG: unable to handle kernel paging request at
ffffffffffffffd8
[ 601.402579] IP: [<ffffffff8108dd90>] kthread_data+0x10/0x20
[ 601.452802] PGD 1c19067 PUD 1c1b067 PMD 0
[ 601.489933] Oops: 0000 [#2] SMP
[ 601.518941] Modules linked in: arc4 md4 ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4
xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter
ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables
gpio_ich intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd
serio_raw hpilo hpwdt 8250_fintek acpi_power_meter ie31200_edac lpc_ich
edac_core ipmi_si ipmi_msghandler mac_hid lp parport nls_utf8 cifs
fscache hid_generic usbhid hid btrfs xor raid6_pq uas usb_storage tg3
ptp ahci psmouse libahci pps_core hpsa
[ 602.085373] CPU: 0 PID: 129 Comm: kworker/u128:3 Tainted: G D
3.18.0-031800rc3-generic #201411022335
[ 602.176706] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06
11/09/2013
[ 602.240091] task: ffff8803f16a3c00 ti: ffff880036b70000 task.ti:
ffff880036b70000
[ 602.307799] RIP: 0010:[<ffffffff8108dd90>] [<ffffffff8108dd90>]
kthread_data+0x10/0x20
[ 602.379755] RSP: 0018:ffff880036b731c8 EFLAGS: 00010096
[ 602.428065] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
ffffffff81ec3d40
[ 602.492644] RDX: 0000000000000003 RSI: 0000000000000000 RDI:
ffff8803f16a3c00
[ 602.557055] RBP: ffff880036b731c8 R08: 0000000000000000 R09:
0000000000000000
[ 602.621551] R10: 0000000000000000 R11: 0000000000000013 R12:
0000000000000000
[ 602.685388] R13: ffff8803f16a4138 R14: 0000000000000001 R15:
0000000000000006
[ 602.749810] FS: 0000000000000000(0000) GS:ffff88040b400000(0000)
knlGS:0000000000000000
[ 602.822794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 602.874743] CR2: 0000000000000028 CR3: 00000000365e9000 CR4:
00000000001407f0
[ 602.938478] Stack:
[ 602.956406] ffff880036b731e8 ffffffff81088dc5 ffff880036b731e8
ffff88040b414640
[ 603.023393] ffff880036b73268 ffffffff817a4b73 ffff880036b73228
ffff8803f17ce3a8
[ 603.089287] ffff880036b73fd8 0000000000014640 ffff880036b73248
0000000000014640
[ 603.156124] Call Trace:
[ 603.178182] [<ffffffff81088dc5>] wq_worker_sleeping+0x15/0xb0
[ 603.231061] [<ffffffff817a4b73>] __schedule+0x5f3/0x780
[ 603.278841] [<ffffffff817a4dd9>] schedule+0x29/0x70
[ 603.324031] [<ffffffff81071945>] do_exit+0x2a5/0x470
[ 603.369604] [<ffffffff810c4ebc>] ? kmsg_dump+0x9c/0xc0
[ 603.415525] [<ffffffff81017dc8>] oops_end+0xb8/0x160
[ 603.461093] [<ffffffff810180c8>] die+0x58/0x90
[ 603.502306] [<ffffffff8101445d>] do_trap+0xcd/0x160
[ 603.546936] [<ffffffff81014936>] do_error_trap+0xe6/0x170
[ 603.596527] [<ffffffff811c74fd>] ? kfree+0x16d/0x170
[ 603.643039] [<ffffffffc05380d0>] ? btrfs_num_copies+0xb0/0x160 [btrfs]
[ 603.702386] [<ffffffffc052dc06>] ? release_extent_buffer+0x36/0xe0
[btrfs]
[ 603.764396] [<ffffffffc052dce2>] ?
free_extent_buffer.part.37+0x32/0x90 [btrfs]
[ 603.830710] [<ffffffffc052e165>] ? free_extent_buffer+0x35/0x40 [btrfs]
[ 603.889622] [<ffffffffc04fed0c>] ?
btrfs_lookup_csums_range+0x2ac/0x4a0 [btrfs]
[ 603.956709] [<ffffffff81015210>] do_invalid_op+0x20/0x30
[ 604.005227] [<ffffffff817aaa5e>] invalid_op+0x1e/0x30
[ 604.052681] [<ffffffffc04fed0c>] ?
btrfs_lookup_csums_range+0x2ac/0x4a0 [btrfs]
[ 604.118969] [<ffffffff811c74fd>] ? kfree+0x16d/0x170
[ 604.164042] [<ffffffffc04fed0c>]
btrfs_lookup_csums_range+0x2ac/0x4a0 [btrfs]
[ 604.229643] [<ffffffffc05251a3>] ?
__btrfs_add_ordered_extent+0x43/0x3c0 [btrfs]
[ 604.297028] [<ffffffffc0560047>] btrfs_reloc_clone_csums+0x77/0xe0
[btrfs]
[ 604.360017] [<ffffffffc051435f>] run_delalloc_nocow+0x62f/0xae0 [btrfs]
[ 604.420316] [<ffffffffc051499e>] run_delalloc_range+0x18e/0x1b0 [btrfs]
[ 604.481403] [<ffffffffc052a4b4>]
writepage_delalloc.isra.32+0xf4/0x170 [btrfs]
[ 604.547266] [<ffffffffc052cb1f>] __extent_writepage+0xcf/0x280 [btrfs]
[ 604.606342] [<ffffffff811a7e20>] ? SyS_msync+0x230/0x230
[ 604.655704] [<ffffffffc052cf8a>]
extent_write_cache_pages.isra.25.constprop.38+0x2ba/0x420 [btrfs]
[ 604.736965] [<ffffffffc052d5fe>] extent_writepages+0x4e/0x70 [btrfs]
[ 604.794404] [<ffffffffc0511b20>] ? btrfs_submit_direct+0x1b0/0x1b0
[btrfs]
[ 604.856681] [<ffffffffc050df08>] btrfs_writepages+0x28/0x30 [btrfs]
[ 604.913692] [<ffffffff8117ae80>] do_writepages+0x20/0x40
[ 604.962262] [<ffffffff8120e635>] __writeback_single_inode+0x45/0x1c0
[ 605.019870] [<ffffffff8121000e>] writeback_sb_inodes+0x22e/0x340
[ 605.074173] [<ffffffff812101be>] __writeback_inodes_wb+0x9e/0xd0
[ 605.128939] [<ffffffff8121047b>] wb_writeback+0x28b/0x330
[ 605.178060] [<ffffffff81201282>] ? get_nr_dirty_inodes+0x52/0x80
[ 605.232308] [<ffffffff812105bf>] wb_check_old_data_flush+0x9f/0xb0
[ 605.288400] [<ffffffff81210704>] wb_do_writeback+0x134/0x1c0
[ 605.339998] [<ffffffff8108a8af>] ? set_worker_desc+0x6f/0x80
[ 605.391472] [<ffffffff81212698>] bdi_writeback_workfn+0x78/0x1f0
[ 605.446241] [<ffffffff810874fe>] process_one_work+0x14e/0x460
[ 605.498863] [<ffffffff81087e7b>] worker_thread+0x11b/0x3f0
[ 605.549473] [<ffffffff81087d60>] ? create_worker+0x1e0/0x1e0
[ 605.601439] [<ffffffff8108d9f9>] kthread+0xc9/0xe0
[ 605.646272] [<ffffffff8108d930>] ? flush_kthread_worker+0x90/0x90
[ 605.701779] [<ffffffff817a8f3c>] ret_from_fork+0x7c/0xb0
[ 605.750846] [<ffffffff8108d930>] ? flush_kthread_worker+0x90/0x90
[ 605.804813] Code: 00 48 89 e5 5d 48 8b 40 c8 48 c1 e8 02 83 e0 01 c3
66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 c8 04 00 00 55 48
89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[ 605.979337] RIP [<ffffffff8108dd90>] kthread_data+0x10/0x20
[ 606.030495] RSP <ffff880036b731c8>
[ 606.061854] CR2: ffffffffffffffd8
[ 606.091158] ---[ end trace 4760080785caca89 ]---
[ 606.138417] Fixing recursive fault but reboot is needed!
[ 628.041189] ------------[ cut here ]------------
[ 628.082909] WARNING: CPU: 0 PID: 129 at
/home/apw/COD/linux/kernel/watchdog.c:290
watchdog_overflow_callback+0x98/0xc0()
[ 628.182195] Watchdog detected hard LOCKUP on cpu 0
[ 628.224240] Modules linked in: arc4 md4 ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4
xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter
ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables
gpio_ich intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd
serio_raw hpilo hpwdt 8250_fintek acpi_power_meter ie31200_edac lpc_ich
edac_core ipmi_si ipmi_msghandler mac_hid lp parport nls_utf8 cifs
fscache hid_generic usbhid hid btrfs xor raid6_pq uas usb_storage tg3
ptp ahci psmouse libahci pps_core hpsa
[ 628.800634] CPU: 0 PID: 129 Comm: kworker/u128:3 Tainted: G D
3.18.0-031800rc3-generic #201411022335
[ 628.892387] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06
11/09/2013
[ 628.956324] 0000000000000122 ffff88040b407ba8 ffffffff8179b798
0000000000000007
[ 629.023414] ffff88040b407bf8 ffff88040b407be8 ffffffff8106eafc
0000000000000000
[ 629.090426] ffff8803f516c000 0000000000000000 ffff88040b407d18
0000000000000000
[ 629.158724] Call Trace:
[ 629.188779] <NMI> [<ffffffff8179b798>] dump_stack+0x46/0x58
[ 629.241611] [<ffffffff8106eafc>] warn_slowpath_common+0x8c/0xc0
[ 629.295718] [<ffffffff8106ebe6>] warn_slowpath_fmt+0x46/0x50
[ 629.348518] [<ffffffff81124188>] watchdog_overflow_callback+0x98/0xc0
[ 629.408366] [<ffffffff81165008>] __perf_event_overflow+0x98/0x230
[ 629.464570] [<ffffffff8102c11a>] ? x86_perf_event_set_period+0xda/0x150
[ 629.525925] [<ffffffff811658c4>] perf_event_overflow+0x14/0x20
[ 629.580555] [<ffffffff81033a99>] intel_pmu_handle_irq+0x1d9/0x2c0
[ 629.637039] [<ffffffff8102b464>] perf_event_nmi_handler+0x34/0x60
[ 629.693305] [<ffffffff8101820a>] nmi_handle+0x8a/0x140
[ 629.742009] [<ffffffff81046a20>] ?
default_send_IPI_mask_allbutself_phys+0x100/0x100
[ 629.815252] [<ffffffff8101896e>] default_do_nmi+0xfe/0x160
[ 629.866694] [<ffffffff81018a60>] do_nmi+0x90/0xd0
[ 629.909845] [<ffffffff817ab3b1>] end_repeat_nmi+0x1e/0x2e
[ 629.958832] [<ffffffff817a8ada>] ? _raw_spin_lock_irq+0x3a/0x60
[ 630.013637] [<ffffffff817a8ada>] ? _raw_spin_lock_irq+0x3a/0x60
[ 630.066976] [<ffffffff817a8ada>] ? _raw_spin_lock_irq+0x3a/0x60
[ 630.121565] <<EOE>> [<ffffffff817a4625>] __schedule+0xa5/0x780
[ 630.176775] [<ffffffff817a4dd9>] schedule+0x29/0x70
[ 630.221368] [<ffffffff81071aab>] do_exit+0x40b/0x470
[ 630.267338] [<ffffffff81017dc8>] oops_end+0xb8/0x160
[ 630.313330] [<ffffffff8178d4de>] no_context+0x1b5/0x1c4
[ 630.360564] [<ffffffff8178d6c0>] __bad_area_nosemaphore+0x1d3/0x1f2
[ 630.418565] [<ffffffff8178d6f2>] bad_area_nosemaphore+0x13/0x15
[ 630.473877] [<ffffffff8105c372>] __do_page_fault+0x3b2/0x550
[ 630.526491] [<ffffffff810d0d8d>] ? call_rcu_sched+0x1d/0x20
[ 630.578309] [<ffffffff81248b8c>] ? proc_destroy_inode+0x1c/0x20
[ 630.633517] [<ffffffff810a4e61>] ? update_curr+0x141/0x1f0
[ 630.684709] [<ffffffff8105c69e>] do_page_fault+0x3e/0x80
[ 630.734007] [<ffffffff817ab048>] page_fault+0x28/0x30
[ 630.781044] [<ffffffff8108dd90>] ? kthread_data+0x10/0x20
[ 630.831439] [<ffffffff81088dc5>] wq_worker_sleeping+0x15/0xb0
[ 630.884290] [<ffffffff817a4b73>] __schedule+0x5f3/0x780
[ 630.931040] [<ffffffff817a4dd9>] schedule+0x29/0x70
[ 630.975132] [<ffffffff81071945>] do_exit+0x2a5/0x470
[ 631.020738] [<ffffffff810c4ebc>] ? kmsg_dump+0x9c/0xc0
[ 631.069118] [<ffffffff81017dc8>] oops_end+0xb8/0x160
[ 631.117791] [<ffffffff810180c8>] die+0x58/0x90
[ 631.157776] [<ffffffff8101445d>] do_trap+0xcd/0x160
[ 631.201786] [<ffffffff81014936>] do_error_trap+0xe6/0x170
[ 631.250238] [<ffffffff811c74fd>] ? kfree+0x16d/0x170
[ 631.294923] [<ffffffffc05380d0>] ? btrfs_num_copies+0xb0/0x160 [btrfs]
[ 631.353303] [<ffffffffc052dc06>] ? release_extent_buffer+0x36/0xe0
[btrfs]
[ 631.414831] [<ffffffffc052dce2>] ?
free_extent_buffer.part.37+0x32/0x90 [btrfs]
[ 631.479502] [<ffffffffc052e165>] ? free_extent_buffer+0x35/0x40 [btrfs]
[ 631.538774] [<ffffffffc04fed0c>] ?
btrfs_lookup_csums_range+0x2ac/0x4a0 [btrfs]
[ 631.605298] [<ffffffff81015210>] do_invalid_op+0x20/0x30
[ 631.654156] [<ffffffff817aaa5e>] invalid_op+0x1e/0x30
[ 631.701640] [<ffffffffc04fed0c>] ?
btrfs_lookup_csums_range+0x2ac/0x4a0 [btrfs]
[ 631.769148] [<ffffffff811c74fd>] ? kfree+0x16d/0x170
[ 631.814158] [<ffffffffc04fed0c>]
btrfs_lookup_csums_range+0x2ac/0x4a0 [btrfs]
[ 631.878501] [<ffffffffc05251a3>] ?
__btrfs_add_ordered_extent+0x43/0x3c0 [btrfs]
[ 631.944855] [<ffffffffc0560047>] btrfs_reloc_clone_csums+0x77/0xe0
[btrfs]
[ 632.006112] [<ffffffffc051435f>] run_delalloc_nocow+0x62f/0xae0 [btrfs]
[ 632.066120] [<ffffffffc051499e>] run_delalloc_range+0x18e/0x1b0 [btrfs]
[ 632.125507] [<ffffffffc052a4b4>]
writepage_delalloc.isra.32+0xf4/0x170 [btrfs]
[ 632.191089] [<ffffffffc052cb1f>] __extent_writepage+0xcf/0x280 [btrfs]
[ 632.249324] [<ffffffff811a7e20>] ? SyS_msync+0x230/0x230
[ 632.298562] [<ffffffffc052cf8a>]
extent_write_cache_pages.isra.25.constprop.38+0x2ba/0x420 [btrfs]
[ 632.382554] [<ffffffffc052d5fe>] extent_writepages+0x4e/0x70 [btrfs]
[ 632.440722] [<ffffffffc0511b20>] ? btrfs_submit_direct+0x1b0/0x1b0
[btrfs]
[ 632.503587] [<ffffffffc050df08>] btrfs_writepages+0x28/0x30 [btrfs]
[ 632.561458] [<ffffffff8117ae80>] do_writepages+0x20/0x40
[ 632.610972] [<ffffffff8120e635>] __writeback_single_inode+0x45/0x1c0
[ 632.670403] [<ffffffff8121000e>] writeback_sb_inodes+0x22e/0x340
[ 632.727108] [<ffffffff812101be>] __writeback_inodes_wb+0x9e/0xd0
[ 632.782298] [<ffffffff8121047b>] wb_writeback+0x28b/0x330
[ 632.832142] [<ffffffff81201282>] ? get_nr_dirty_inodes+0x52/0x80
[ 632.887494] [<ffffffff812105bf>] wb_check_old_data_flush+0x9f/0xb0
[ 632.944732] [<ffffffff81210704>] wb_do_writeback+0x134/0x1c0
[ 632.997801] [<ffffffff8108a8af>] ? set_worker_desc+0x6f/0x80
[ 633.051181] [<ffffffff81212698>] bdi_writeback_workfn+0x78/0x1f0
[ 633.107282] [<ffffffff810874fe>] process_one_work+0x14e/0x460
[ 633.162037] [<ffffffff81087e7b>] worker_thread+0x11b/0x3f0
[ 633.213238] [<ffffffff81087d60>] ? create_worker+0x1e0/0x1e0
[ 633.267016] [<ffffffff8108d9f9>] kthread+0xc9/0xe0
[ 633.311850] [<ffffffff8108d930>] ? flush_kthread_worker+0x90/0x90
[ 633.368591] [<ffffffff817a8f3c>] ret_from_fork+0x7c/0xb0
[ 633.417460] [<ffffffff8108d930>] ? flush_kthread_worker+0x90/0x90
[ 633.474349] ---[ end trace 4760080785caca8a ]---
I'm aware raid5/6 isn't even considered close production ready, so I'm
not sure if this is all interesting for anyone to look at yet, but if is
please let me know what else of information I can provide.
--
erikberg
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Kernel crash during "btrfs device delete" on raid6 volume
2014-11-04 14:36 Kernel crash during "btrfs device delete" on raid6 volume Erik Berg
@ 2014-11-04 14:55 ` Chris Mason
2014-11-04 15:58 ` Chris Mason
0 siblings, 1 reply; 4+ messages in thread
From: Chris Mason @ 2014-11-04 14:55 UTC (permalink / raw)
To: Erik Berg; +Cc: linux-btrfs, Mark Fasheh
On Tue, Nov 4, 2014 at 9:36 AM, Erik Berg <btrfs@slipsprogrammoer.no>
wrote:
> Pulled the latest btrfs-progs from kdave (v3.17-12-gcafacda) and
> using the latest linux release candidate (3.18.0-031800rc3-generic)
> from canonical/ubuntu
>
> btrfs fi show
> Label: none uuid: 5c5fea06-0319-4e03-a42e-004e64aeed92
> Total devices 9 FS bytes used 10.91TiB
> devid 2 size 931.48GiB used 928.02GiB path /dev/sdc1
> devid 3 size 931.48GiB used 928.02GiB path /dev/sdd1
> devid 4 size 1.82TiB used 1.67TiB path /dev/sde1
> devid 5 size 2.73TiB used 2.28TiB path /dev/sdf1
> devid 6 size 3.64TiB used 2.73TiB path /dev/sdg1
> devid 7 size 3.64TiB used 2.73TiB path /dev/sdh1
> devid 8 size 931.46GiB used 655.90GiB path /dev/sdb1
> devid 9 size 3.64TiB used 2.73TiB path /dev/sdi1
> devid 10 size 3.64TiB used 1.79TiB path /dev/sdj1
>
> btrfs fi df
> Data, RAID6: total=10.91TiB, used=10.90TiB
> System, RAID6: total=96.00MiB, used=800.00KiB
> Metadata, RAID6: total=13.23GiB, used=11.79GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
> Trying to remove device sdb1, the kernel crashes after a minute or so.
>
> [ 597.576827] ------------[ cut here ]------------
> [ 597.617519] kernel BUG at /home/apw/COD/linux/mm/slub.c:3334!
> [ 597.668145] invalid opcode: 0000 [#1] SMP
> [ 597.704410] Modules linked in: arc4 md4 ipt_MASQUERADE
> nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT
> nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc
> ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat
> ebtables x_tables gpio_ich intel_rapl x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul
> ghash_clmulni_intel cryptd serio_raw hpilo hpwdt 8250_fintek
> acpi_power_meter ie31200_edac lpc_ich edac_core ipmi_si
> ipmi_msghandler mac_hid lp parport nls_utf8 cifs fscache hid_generic
> usbhid hid btrfs xor raid6_pq uas usb_storage tg3 ptp ahci psmouse
> libahci pps_core hpsa
> [ 598.268179] CPU: 1 PID: 129 Comm: kworker/u128:3 Not tainted
> 3.18.0-031800rc3-generic #201411022335
> [ 598.349925] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06
> 11/09/2013
> [ 598.413231] Workqueue: writeback bdi_writeback_workfn
> (flush-btrfs-2)
> [ 598.471103] task: ffff8803f16a3c00 ti: ffff880036b70000 task.ti:
> ffff880036b70000
> [ 598.538393] RIP: 0010:[<ffffffff811c74fd>] [<ffffffff811c74fd>]
> kfree+0x16d/0x170
> [ 598.606217] RSP: 0018:ffff880036b73528 EFLAGS: 00010246
> [ 598.653844] RAX: 01ffff0000000000 RBX: ffff880036b735c8 RCX:
> 0000000000000000
> [ 598.717899] RDX: ffff8803743a6010 RSI: dead000000100100 RDI:
> ffff880036b735c8
> [ 598.781662] RBP: ffff880036b73558 R08: 0000000000000000 R09:
> ffffea0000dadcc0
> [ 598.846028] R10: 0000000000000001 R11: 0000000000000010 R12:
> ffff8803f1e09800
> [ 598.910713] R13: ffff8803ac757d40 R14: ffffffffc04fed0c R15:
> ffff880036b735d8
> [ 598.975333] FS: 0000000000000000(0000) GS:ffff88040b420000(0000)
> knlGS:0000000000000000
> [ 599.048512] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 599.100167] CR2: 00007fa9a3854024 CR3: 0000000001c16000 CR4:
> 00000000001407e0
> [ 599.165150] Stack:
> [ 599.183305] ffff8803f1e09800 00000dad07c20000 ffff8803f1e09800
> ffff8803ac757d40
> [ 599.249603] ffff8803ac757d40 ffff880036b735d8 ffff880036b73618
> ffffffffc04fed0c
> [ 599.316306] ffff8803f1b86b00 ffff880374338000 00000dad07dc0000
> ffff880036b73638
> [ 599.383404] Call Trace:
> [ 599.405429] [<ffffffffc04fed0c>]
> btrfs_lookup_csums_range+0x2ac/0x4a0 [btrfs]
Not a new bug unfortunately, but since it is in the error handling
people must not be hitting it often. It's also not related to device
replace.
while (ret < 0 && !list_empty(&tmplist)) {
sums = list_entry(&tmplist, struct btrfs_ordered_sum,
list);
list_del(&sums->list);
kfree(sums);
}
We're trying to call kfree on the on-stack list head. I'm fixing it up
here, thanks for posting the oops!
-chris
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Kernel crash during "btrfs device delete" on raid6 volume
2014-11-04 14:55 ` Chris Mason
@ 2014-11-04 15:58 ` Chris Mason
2014-11-04 23:42 ` Mark Fasheh
0 siblings, 1 reply; 4+ messages in thread
From: Chris Mason @ 2014-11-04 15:58 UTC (permalink / raw)
To: Erik Berg; +Cc: linux-btrfs, Mark Fasheh
[-- Attachment #1: Type: text/plain, Size: 3529 bytes --]
On Tue, Nov 4, 2014 at 9:55 AM, Chris Mason <clm@fb.com> wrote:
> On Tue, Nov 4, 2014 at 9:36 AM, Erik Berg <btrfs@slipsprogrammoer.no>
> wrote:
>> Pulled the latest btrfs-progs from kdave (v3.17-12-gcafacda) and
>> using the latest linux release candidate (3.18.0-031800rc3-generic)
>> from canonical/ubuntu
>>
>> Trying to remove device sdb1, the kernel crashes after a minute or
>> so.
>>
>> [ 597.576827] ------------[ cut here ]------------
>> [ 597.617519] kernel BUG at /home/apw/COD/linux/mm/slub.c:3334!
>> [ 597.668145] invalid opcode: 0000 [#1] SMP
>> [ 597.704410] Modules linked in: arc4 md4 ipt_MASQUERADE
>> nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
>> ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp
>> bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables
>> ebtable_nat ebtables x_tables gpio_ich intel_rapl
>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
>> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd serio_raw
>> hpilo hpwdt 8250_fintek acpi_power_meter ie31200_edac lpc_ich
>> edac_core ipmi_si ipmi_msghandler mac_hid lp parport nls_utf8 cifs
>> fscache hid_generic usbhid hid btrfs xor raid6_pq uas usb_storage
>> tg3 ptp ahci psmouse libahci pps_core hpsa
>> [ 598.268179] CPU: 1 PID: 129 Comm: kworker/u128:3 Not tainted
>> 3.18.0-031800rc3-generic #201411022335
>> [ 598.349925] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06
>> 11/09/2013
>> [ 598.413231] Workqueue: writeback bdi_writeback_workfn
>> (flush-btrfs-2)
>> [ 598.471103] task: ffff8803f16a3c00 ti: ffff880036b70000 task.ti:
>> ffff880036b70000
>> [ 598.538393] RIP: 0010:[<ffffffff811c74fd>] [<ffffffff811c74fd>]
>> kfree+0x16d/0x170
>> [ 598.606217] RSP: 0018:ffff880036b73528 EFLAGS: 00010246
>> [ 598.653844] RAX: 01ffff0000000000 RBX: ffff880036b735c8 RCX:
>> 0000000000000000
>> [ 598.717899] RDX: ffff8803743a6010 RSI: dead000000100100 RDI:
>> ffff880036b735c8
>> [ 598.781662] RBP: ffff880036b73558 R08: 0000000000000000 R09:
>> ffffea0000dadcc0
>> [ 598.846028] R10: 0000000000000001 R11: 0000000000000010 R12:
>> ffff8803f1e09800
>> [ 598.910713] R13: ffff8803ac757d40 R14: ffffffffc04fed0c R15:
>> ffff880036b735d8
>> [ 598.975333] FS: 0000000000000000(0000) GS:ffff88040b420000(0000)
>> knlGS:0000000000000000
>> [ 599.048512] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 599.100167] CR2: 00007fa9a3854024 CR3: 0000000001c16000 CR4:
>> 00000000001407e0
>> [ 599.165150] Stack:
>> [ 599.183305] ffff8803f1e09800 00000dad07c20000 ffff8803f1e09800
>> ffff8803ac757d40
>> [ 599.249603] ffff8803ac757d40 ffff880036b735d8 ffff880036b73618
>> ffffffffc04fed0c
>> [ 599.316306] ffff8803f1b86b00 ffff880374338000 00000dad07dc0000
>> ffff880036b73638
>> [ 599.383404] Call Trace:
>> [ 599.405429] [<ffffffffc04fed0c>]
>> btrfs_lookup_csums_range+0x2ac/0x4a0 [btrfs]
>
> Not a new bug unfortunately, but since it is in the error handling
> people must not be hitting it often. It's also not related to device
> replace.
>
>
> while (ret < 0 && !list_empty(&tmplist)) {
> sums = list_entry(&tmplist, struct btrfs_ordered_sum,
> list);
> list_del(&sums->list);
> kfree(sums);
> }
>
> We're trying to call kfree on the on-stack list head. I'm fixing it
> up here, thanks for posting the oops!
Fix attached, or you can wait for the next rc. Thanks.
-chris
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: btrfs.patch --]
[-- Type: text/x-patch, Size: 1253 bytes --]
>From 6e5aafb27419f32575b27ef9d6a31e5d54661aca Mon Sep 17 00:00:00 2001
From: Chris Mason <clm@fb.com>
Date: Tue, 4 Nov 2014 06:59:04 -0800
Subject: [PATCH] Btrfs: fix kfree on list_head in btrfs_lookup_csums_range
error cleanup
If we hit any errors in btrfs_lookup_csums_range, we'll loop through all
the csums we allocate and free them. But the code was using list_entry
incorrectly, and ended up trying to free the on-stack list_head instead.
This bug came from commit 0678b6185
btrfs: Don't BUG_ON kzalloc error in btrfs_lookup_csums_range()
Signed-off-by: Chris Mason <clm@fb.com>
Reported-by: Erik Berg <btrfs@slipsprogrammoer.no>
cc: stable@vger.kernel.org # 3.3 or newer
---
fs/btrfs/file-item.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 783a943..84a2d18 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -413,7 +413,7 @@ int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end,
ret = 0;
fail:
while (ret < 0 && !list_empty(&tmplist)) {
- sums = list_entry(&tmplist, struct btrfs_ordered_sum, list);
+ sums = list_entry(tmplist.next, struct btrfs_ordered_sum, list);
list_del(&sums->list);
kfree(sums);
}
--
1.8.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: Kernel crash during "btrfs device delete" on raid6 volume
2014-11-04 15:58 ` Chris Mason
@ 2014-11-04 23:42 ` Mark Fasheh
0 siblings, 0 replies; 4+ messages in thread
From: Mark Fasheh @ 2014-11-04 23:42 UTC (permalink / raw)
To: Chris Mason; +Cc: Erik Berg, linux-btrfs
On Tue, Nov 04, 2014 at 10:58:48AM -0500, Chris Mason wrote:
>> Not a new bug unfortunately, but since it is in the error handling people
>> must not be hitting it often. It's also not related to device replace.
>>
>>
>> while (ret < 0 && !list_empty(&tmplist)) {
>> sums = list_entry(&tmplist, struct btrfs_ordered_sum,
>> list);
>> list_del(&sums->list);
>> kfree(sums);
>> }
>>
>> We're trying to call kfree on the on-stack list head. I'm fixing it up
>> here, thanks for posting the oops!
>
> Fix attached, or you can wait for the next rc. Thanks.
>
> -chris
>
>
> >From 6e5aafb27419f32575b27ef9d6a31e5d54661aca Mon Sep 17 00:00:00 2001
> From: Chris Mason <clm@fb.com>
> Date: Tue, 4 Nov 2014 06:59:04 -0800
> Subject: [PATCH] Btrfs: fix kfree on list_head in btrfs_lookup_csums_range
> error cleanup
>
> If we hit any errors in btrfs_lookup_csums_range, we'll loop through all
> the csums we allocate and free them. But the code was using list_entry
> incorrectly, and ended up trying to free the on-stack list_head instead.
>
> This bug came from commit 0678b6185
Wow, that's an old commit! Thanks for the CC. The fix looks good to me, so
you can add:
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
if you like, thanks.
--Mark
--
Mark Fasheh
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-11-04 23:42 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-04 14:36 Kernel crash during "btrfs device delete" on raid6 volume Erik Berg
2014-11-04 14:55 ` Chris Mason
2014-11-04 15:58 ` Chris Mason
2014-11-04 23:42 ` Mark Fasheh
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.