linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Crash when trying to start a replace on missing device
@ 2015-09-12 22:51 Martin Bakiev
  2015-09-13  4:55 ` Omar Sandoval
  0 siblings, 1 reply; 2+ messages in thread
From: Martin Bakiev @ 2015-09-12 22:51 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

Hi guys,

I'm just doing testing with btrfs and I ran into a crash when
simulating a failed drive. I yanked out one (/dev/sdc) of 4 drives and
tried to replace it with another (/dev/sdf) with this command:

btrfs replace start missing /dev/sdf /mount_point -f

That seemed to cause a crash, you can check out the attached dmesg
file for stack/more info. I was told to report the crash on from IRC.
I hope this helps.

Other info:
uname -a:
Linux fedora-nas 4.1.6-201.fc22.x86_64 #1 SMP Fri Sep 4 17:49:24 UTC
2015 x86_64 x86_64 x86_64 GNU/Linux

btrfs --version
btrfs-progs v4.1

btrfs fi show
Label: 'raid5'  uuid: 8b17c1d2-4ef6-4946-b77f-eac57c4e23a6
       Total devices 5 FS bytes used 18.32GiB
       devid    0 size 4.55TiB used 7.38GiB path /dev/sdf
       devid    1 size 4.55TiB used 7.38GiB path /dev/sdb
       devid    3 size 4.55TiB used 7.38GiB path /dev/sdd
       devid    4 size 4.55TiB used 7.38GiB path /dev/sde
       *** Some devices missing

btrfs fi df
Data, RAID5: total=21.00GiB, used=18.30GiB
System, RAID5: total=96.00MiB, used=16.00KiB
Metadata, RAID5: total=1.03GiB, used=19.59MiB
GlobalReserve, single: total=16.00MiB, used=0.00B

dmesg attached.

Thanks,
Martin

[-- Attachment #2: dmesg.txt --]
[-- Type: text/plain, Size: 4179 bytes --]

[ 4896.123124] BTRFS: dev_replace from <missing disk> (devid 5) to /dev/sdf started
[ 4896.380273] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
[ 4896.380351] IP: [<ffffffff81376651>] bio_add_page+0x11/0xa0
[ 4896.380404] PGD 274506067 PUD 274507067 PMD 0 
[ 4896.380449] Oops: 0000 [#1] SMP 
[ 4896.380481] Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw coretemp kvm_intel kvm gpio_ich iTCO_wdt iTCO_vendor_support ipmi_ssif crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev lpc_ich i2c_i801 mfd_core ipmi_si tpm_tis i2c_ismt ipmi_msghandler tpm shpchp acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc btrfs xor raid6_pq ast drm_kms_helper crc32c_intel ttm serio_raw igb drm mpt2sas ptp pps_core dca i2c_algo_bit raid_class scsi_transport_sas

[ 4896.381219] CPU: 2 PID: 6337 Comm: btrfs Not tainted 4.1.6-201.fc22.x86_64 #1
[ 4896.381275] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./C2550D4I, BIOS P2.10 08/06/2014
[ 4896.381348] task: ffff8802734cb160 ti: ffff8801f6de0000 task.ti: ffff8801f6de0000
[ 4896.381406] RIP: 0010:[<ffffffff81376651>]  [<ffffffff81376651>] bio_add_page+0x11/0xa0
[ 4896.381476] RSP: 0018:ffff8801f6de3758  EFLAGS: 00010246
[ 4896.381515] RAX: 0000000000000000 RBX: ffff88017ab2c600 RCX: 0000000000000000
[ 4896.381567] RDX: 0000000000001000 RSI: ffffea00058e0dc0 RDI: ffff8802566bf6e8
[ 4896.381618] RBP: ffff8801f6de3758 R08: ffff88027fd1ac00 R09: 0000000000000800
[ 4896.381669] R10: ffffea0009cd2100 R11: 0000000000280000 R12: ffff88000f348a18
[ 4896.381720] R13: ffff88011b049780 R14: ffff88027397e900 R15: ffff88000f348800
[ 4896.381771] FS:  00007f5e074278c0(0000) GS:ffff88027fd00000(0000) knlGS:0000000000000000
[ 4896.383541] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 4896.385318] CR2: 0000000000000098 CR3: 0000000267703000 CR4: 00000000001006e0
[ 4896.387109] Stack:
[ 4896.388881]  ffff8801f6de37d8 ffffffffa0262828 ffff8801f6de3790 ffff88000f348a20
[ 4896.390687]  0000000000000000 0000000000000000 ffff8801f6de37d8 ffffffff81200411
[ 4896.392467]  ffff8802734cb160 00000000e9587795 ffff88000f348a40 ffff88027397e900
[ 4896.394274] Call Trace:
[ 4896.396118]  [<ffffffffa0262828>] scrub_add_page_to_rd_bio+0xc8/0x2b0 [btrfs]
[ 4896.397961]  [<ffffffff81200411>] ? alloc_pages_current+0x91/0x110
[ 4896.399836]  [<ffffffffa0265254>] scrub_pages+0x1f4/0x280 [btrfs]
[ 4896.401692]  [<ffffffffa0265f2b>] scrub_stripe+0x82b/0x1090 [btrfs]
[ 4896.403570]  [<ffffffffa02668ab>] scrub_chunk.isra.19+0x11b/0x140 [btrfs]
[ 4896.405470]  [<ffffffffa0266b49>] scrub_enumerate_chunks+0x279/0x4f0 [btrfs]
[ 4896.407397]  [<ffffffffa0265691>] ? scrub_setup_ctx.isra.17+0x231/0x2a0 [btrfs]
[ 4896.409317]  [<ffffffffa02684b0>] btrfs_scrub_dev+0x1c0/0x570 [btrfs]
[ 4896.411242]  [<ffffffffa027cf51>] btrfs_dev_replace_start+0x351/0x3b0 [btrfs]
[ 4896.413173]  [<ffffffffa024098c>] btrfs_ioctl+0x1c6c/0x2930 [btrfs]
[ 4896.415071]  [<ffffffff811acb71>] ? unlock_page+0x71/0x90
[ 4896.416950]  [<ffffffff811ad647>] ? filemap_map_pages+0x2c7/0x2e0
[ 4896.418853]  [<ffffffff811e04c2>] ? handle_mm_fault+0x1132/0x1820
[ 4896.420772]  [<ffffffff812304b6>] ? cp_new_stat+0x156/0x190
[ 4896.422711]  [<ffffffff8123fcf6>] do_vfs_ioctl+0x2c6/0x4d0
[ 4896.424632]  [<ffffffff8123ff81>] SyS_ioctl+0x81/0xa0
[ 4896.426547]  [<ffffffff81068eff>] ? do_page_fault+0x2f/0x80
[ 4896.428462]  [<ffffffff817a002e>] system_call_fastpath+0x12/0x71
[ 4896.430379] Code: 1f 00 55 48 89 e5 e8 5f fd ff ff 5d c3 0f 1f 44 00 00 31 c0 c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 47 08 4c 8b 4f 20 48 89 e5 <48> 8b 80 98 00 00 00 4c 8b 90 78 03 00 00 41 8b 82 fc 05 00 00 
[ 4896.434490] RIP  [<ffffffff81376651>] bio_add_page+0x11/0xa0
[ 4896.436482]  RSP <ffff8801f6de3758>
[ 4896.438459] CR2: 0000000000000098
[ 4896.835369] ---[ end trace 90c698bf9010d6b7 ]---

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Crash when trying to start a replace on missing device
  2015-09-12 22:51 Crash when trying to start a replace on missing device Martin Bakiev
@ 2015-09-13  4:55 ` Omar Sandoval
  0 siblings, 0 replies; 2+ messages in thread
From: Omar Sandoval @ 2015-09-13  4:55 UTC (permalink / raw)
  To: Martin Bakiev; +Cc: linux-btrfs

On Sat, Sep 12, 2015 at 04:51:18PM -0600, Martin Bakiev wrote:
> Hi guys,
> 
> I'm just doing testing with btrfs and I ran into a crash when
> simulating a failed drive. I yanked out one (/dev/sdc) of 4 drives and
> tried to replace it with another (/dev/sdf) with this command:
> 
> btrfs replace start missing /dev/sdf /mount_point -f
> 
> That seemed to cause a crash, you can check out the attached dmesg
> file for stack/more info. I was told to report the crash on from IRC.
> I hope this helps.
> 
> Other info:
> uname -a:
> Linux fedora-nas 4.1.6-201.fc22.x86_64 #1 SMP Fri Sep 4 17:49:24 UTC
> 2015 x86_64 x86_64 x86_64 GNU/Linux
> 
> btrfs --version
> btrfs-progs v4.1
> 
> btrfs fi show
> Label: 'raid5'  uuid: 8b17c1d2-4ef6-4946-b77f-eac57c4e23a6
>        Total devices 5 FS bytes used 18.32GiB
>        devid    0 size 4.55TiB used 7.38GiB path /dev/sdf
>        devid    1 size 4.55TiB used 7.38GiB path /dev/sdb
>        devid    3 size 4.55TiB used 7.38GiB path /dev/sdd
>        devid    4 size 4.55TiB used 7.38GiB path /dev/sde
>        *** Some devices missing
> 
> btrfs fi df
> Data, RAID5: total=21.00GiB, used=18.30GiB
> System, RAID5: total=96.00MiB, used=16.00KiB
> Metadata, RAID5: total=1.03GiB, used=19.59MiB
> GlobalReserve, single: total=16.00MiB, used=0.00B
> 
> dmesg attached.
> 
> Thanks,
> Martin

Thanks for the report, Martin. This should be fixed in v4.3-rc1 if you
want to give that a spin. Specifically, you'll want these commits:

4a770891d9dd Btrfs: fix parity scrub of RAID 5/6 with missing device
73ff61dbe5ed Btrfs: fix device replace of a missing RAID 5/6 device
b4ee1782686d Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation
7cb2c4202ed5 Btrfs: count devices correctly in readahead during RAID 5/6 replace
03679ade86b2 Btrfs: remove misleading handling of missing device scrub

-- 
Omar

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-09-13  4:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-12 22:51 Crash when trying to start a replace on missing device Martin Bakiev
2015-09-13  4:55 ` Omar Sandoval

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).