Kernel crash if both devices in raid1 are failing

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Kernel crash if both devices in raid1 are failing
@ 2016-04-14 20:30 Dmitry Katsubo
  2016-04-17 23:18 ` Dmitry Katsubo
  2016-04-18  0:19 ` Chris Murphy
  0 siblings, 2 replies; 9+ messages in thread
From: Dmitry Katsubo @ 2016-04-14 20:30 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3572 bytes --]

Dear btrfs community,

I have the following setup:

# btrfs fi show /home
Label: none  uuid: 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
	Total devices 3 FS bytes used 55.68GiB
	devid    1 size 52.91GiB used 0.00B path /dev/sdd2
	devid    2 size 232.89GiB used 59.03GiB path /dev/sda
	devid    3 size 111.79GiB used 59.03GiB path /dev/sdc1

btrfs volume was created in raid1 mode both for data and metadata and mounted
with compress=lzo option.

Unfortunately, two drives (sda and sdc1) started to fail at the same time. This
leads to system crash if I start the system in runlevel 3 (see crash1.log).

After I have started the system in single mode, volume can be mounted in rw
mode and I can write some data into it. Unfortunately when I tried to read
a certain file, the system crashed (see crash2.log).

I have started scrub on the volume and here is the report:

# btrfs scrub status /home
scrub status for 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
	scrub started at Tue Apr 12 20:39:20 2016 and finished after 02:40:09
	total bytes scrubbed: 55.68GiB with 1767 errors
	error details: verify=175 csum=1592
	corrected errors: 1110, uncorrectable errors: 657, unverified errors: 0

Obviously, some data is lost. However due to above crash, I cannot just copy
the data from the volume. I would assume that I still can access the data, but
the files for which data is lost, should result I/O error (I would then recover
them from my backup).

I have decided to attach another drive and remove failing devices one-by-one.
However that does not work:

# btrfs dev delete /dev/sda /home
[  168.680057] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  168.684236] ata3.00: BMDMA stat 0x25
[  168.688464] ata3.00: failed command: READ DMA
[  168.692681] ata3.00: cmd c8/00:08:68:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 in
[  168.692681]          res 51/40:08:68:4b:84/40:08:07:00:00/e7 Emask 0x9 (media error)
[  168.701281] ata3.00: status: { DRDY ERR }
[  168.705600] ata3.00: error: { UNC }
[  168.724446] blk_update_request: I/O error, dev sda, sector 126110568
[  168.728860] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 43, flush 0, corrupt 0, gen 0
[  172.824043] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  172.828651] ata3.00: BMDMA stat 0x25
[  172.833281] ata3.00: failed command: READ DMA
[  172.837876] ata3.00: cmd c8/00:08:50:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 in
[  172.837876]          res 51/40:08:50:4b:84/40:08:07:00:00/e7 Emask 0x9 (media error)
[  172.847296] ata3.00: status: { DRDY ERR }
[  172.852054] ata3.00: error: { UNC }
[  172.872404] blk_update_request: I/O error, dev sda, sector 126110544
[  172.877241] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 44, flush 0, corrupt 0, gen 0
ERROR: error removing device '/dev/sda': Input/output error

The same happens when I try to delete /dev/sdc1 from the volume. Is there any
btrfs "force" option so that btrfs balances only chunks that are accessible? I
can potentially physically disconnect /dev/sda, but the loss will be greater
I believe.

How can I proceed except btrfs restore?

During scrub operation the following was recorded in the logs:

[Tue Apr 12 23:10:20 2016] BTRFS warning (device sdc1): checksum error at logical 126952947712 on dev /dev/sdc1, sector 126150176, root 258, inode 879324, offset 308256768, length 4096, links 1 (path: lib/mysql/ibdata1)

If I collect all the messages like this, will it give a full picture of damaged files?

Many thanks in advance.

P.S. Linux kernel v4.4.2, btrfs-progs v4.4.

-- 
With best regards,
Dmitry

[-- Attachment #2: crash1.log --]
[-- Type: text/plain, Size: 13683 bytes --]

[  231.228068] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  231.231255] ata3.00: BMDMA stat 0x25
[  231.234443] ata3.00: failed command: READ DMA
[  231.237661] ata3.00: cmd c8/00:08:60:f9:99/00:00:00:00:00/e2 tag 0 dma 4096 in
[  231.237661]          res 51/40:08:60:f9:99/00:00:00:00:00/e2 Emask 0x9 (media error)
[  231.244022] ata3.00: status: { DRDY ERR }
[  231.247119] ata3.00: error: { UNC }
[  231.264447] blk_update_request: I/O error, dev sda, sector 43645280
[  231.267817] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 39, flush 0, corrupt 0, gen 0
[  232.127298] BTRFS error (device sdc1): parent transid verify failed on 65675001856 wanted 480578 found 480435
[  232.185418] BTRFS error (device sdc1): parent transid verify failed on 65679622144 wanted 480579 found 480435
[  232.359943] BTRFS error (device sdc1): parent transid verify failed on 65674952704 wanted 480578 found 480435
[  232.656145] BTRFS error (device sdc1): parent transid verify failed on 65674379264 wanted 480578 found 480435
[  232.851908] BTRFS error (device sdc1): parent transid verify failed on 65669464064 wanted 480579 found 480577
[  233.142476] BTRFS error (device sdc1): parent transid verify failed on 65674313728 wanted 480578 found 480435
[  233.497501] BTRFS error (device sdc1): parent transid verify failed on 65669513216 wanted 480579 found 480577
[  233.524154] BTRFS error (device sdc1): parent transid verify failed on 65683652608 wanted 480581 found 480435
[  237.764056] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  237.768802] ata3.00: BMDMA stat 0x25
[  237.773576] ata3.00: failed command: READ DMA
[  237.778314] ata3.00: cmd c8/00:20:20:e4:e5/00:00:00:00:00/e3 tag 0 dma 16384 in
[  237.778314]          res 51/40:20:20:e4:e5/00:00:00:00:00/e3 Emask 0x9 (media error)
[  237.787563] ata3.00: status: { DRDY ERR }
[  237.792179] ata3.00: error: { UNC }
[  237.864419] blk_update_request: I/O error, dev sda, sector 65397792
[  237.868875] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 40, flush 0, corrupt 0, gen 0
[  238.543890] BTRFS error (device sdc1): parent transid verify failed on 65673199616 wanted 480580 found 480435
[  238.801479] BTRFS error (device sdc1): parent transid verify failed on 65683308544 wanted 480581 found 480435
[  239.347274] BTRFS error (device sdc1): parent transid verify failed on 65674412032 wanted 480578 found 480435
[  239.372299] BTRFS error (device sdc1): parent transid verify failed on 65672527872 wanted 480580 found 480435
[  239.491870] BTRFS error (device sdc1): parent transid verify failed on 65668481024 wanted 480579 found 480577
[  239.582422] BTRFS error (device sdc1): parent transid verify failed on 65686175744 wanted 480581 found 480435
[  239.691309] BTRFS error (device sdc1): parent transid verify failed on 65681702912 wanted 480579 found 480435
[  240.017385] BTRFS error (device sdc1): parent transid verify failed on 65686634496 wanted 480581 found 480435
[  244.040086] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  244.044440] ata3.00: BMDMA stat 0x25
[  244.048966] ata3.00: failed command: READ DMA
[  244.053358] ata3.00: cmd c8/00:28:e0:77:2a/00:00:00:00:00/e3 tag 0 dma 20480 in
[  244.053358]          res 51/40:28:e0:77:2a/00:00:00:00:00/e3 Emask 0x9 (media error)
[  244.062025] ata3.00: status: { DRDY ERR }
[  244.066476] ata3.00: error: { UNC }
[  244.084473] blk_update_request: I/O error, dev sda, sector 53114848
[  244.089016] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 41, flush 0, corrupt 0, gen 0
[  244.093894] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 42, flush 0, corrupt 0, gen 0
[  248.240054] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  248.244598] ata3.00: BMDMA stat 0x25
[  248.249081] ata3.00: failed command: READ DMA
[  248.253556] ata3.00: cmd c8/00:28:e0:77:2a/00:00:00:00:00/e3 tag 0 dma 20480 in
[  248.253556]          res 51/40:28:e0:77:2a/00:00:00:00:00/e3 Emask 0x9 (media error)
[  248.262530] ata3.00: status: { DRDY ERR }
[  248.267010] ata3.00: error: { UNC }
[  248.284476] blk_update_request: I/O error, dev sda, sector 53114848
[  248.288964] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 43, flush 0, corrupt 0, gen 0
[  248.293527] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 44, flush 0, corrupt 0, gen 0
[  252.408055] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  252.412491] ata3.00: BMDMA stat 0x25
[  252.416923] ata3.00: failed command: READ DMA
[  252.421349] ata3.00: cmd c8/00:28:e0:77:2a/00:00:00:00:00/e3 tag 0 dma 20480 in
[  252.421349]          res 51/40:28:e0:77:2a/00:00:00:00:00/e3 Emask 0x9 (media error)
[  252.430251] ata3.00: status: { DRDY ERR }
[  252.434649] ata3.00: error: { UNC }
[  252.452508] blk_update_request: I/O error, dev sda, sector 53114848
[  252.456940] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 45, flush 0, corrupt 0, gen 0
[  252.461408] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 46, flush 0, corrupt 0, gen 0
[  256.552055] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  256.556598] ata3.00: BMDMA stat 0x25
[  256.561200] ata3.00: failed command: READ DMA
[  256.565713] ata3.00: cmd c8/00:28:e0:77:2a/00:00:00:00:00/e3 tag 0 dma 20480 in
[  256.565713]          res 51/40:28:e0:77:2a/00:00:00:00:00/e3 Emask 0x9 (media error)
[  256.574826] ata3.00: status: { DRDY ERR }
[  256.578304] ata3.00: error: { UNC }
[  256.604447] blk_update_request: I/O error, dev sda, sector 53114848
[  256.608960] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 47, flush 0, corrupt 0, gen 0
[  256.613615] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 48, flush 0, corrupt 0, gen 0
[  260.832091] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  260.836663] ata3.00: BMDMA stat 0x25
[  260.841190] ata3.00: failed command: READ DMA
[  260.845770] ata3.00: cmd c8/00:28:e0:77:2a/00:00:00:00:00/e3 tag 0 dma 20480 in
[  260.845770]          res 51/40:28:e0:77:2a/00:00:00:00:00/e3 Emask 0x9 (media error)
[  260.855051] ata3.00: status: { DRDY ERR }
[  260.859741] ata3.00: error: { UNC }
[  260.880437] blk_update_request: I/O error, dev sda, sector 53114848
[  260.885169] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 49, flush 0, corrupt 0, gen 0
[  260.889647] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 50, flush 0, corrupt 0, gen 0
[  264.988058] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  264.992843] ata3.00: BMDMA stat 0x25
[  264.997661] ata3.00: failed command: READ DMA
[  265.002463] ata3.00: cmd c8/00:28:e0:77:2a/00:00:00:00:00/e3 tag 0 dma 20480 in
[  265.002463]          res 51/40:28:e0:77:2a/00:00:00:00:00/e3 Emask 0x9 (media error)
[  265.012041] ata3.00: status: { DRDY ERR }
[  265.016832] ata3.00: error: { UNC }
[  265.044408] blk_update_request: I/O error, dev sda, sector 53114848
[  265.049112] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 51, flush 0, corrupt 0, gen 0
[  265.053514] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 52, flush 0, corrupt 0, gen 0
[  269.168059] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  269.172887] ata3.00: BMDMA stat 0x25
[  269.177253] ata3.00: failed command: READ DMA
[  269.180332] ata3.00: cmd c8/00:28:e0:77:2a/00:00:00:00:00/e3 tag 0 dma 20480 in
[  269.180332]          res 51/40:28:e0:77:2a/00:00:00:00:00/e3 Emask 0x9 (media error)
[  269.186544] ata3.00: status: { DRDY ERR }
[  269.189696] ata3.00: error: { UNC }
[  269.208445] blk_update_request: I/O error, dev sda, sector 53114848
[  269.213356] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 53, flush 0, corrupt 0, gen 0
[  269.217903] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 54, flush 0, corrupt 0, gen 0
[  269.221779] ------------[ cut here ]------------
[  269.225528] kernel BUG at /build/linux-z6Er2E/linux-4.4.2/fs/btrfs/extent_io.c:2401!
[  269.225647] invalid opcode: 0000 [#1] SMP 
[  269.225647] Modules linked in: netconsole configfs option usb_wwan usbserial bridge stp llc arc4 iTCO_wdt iTCO_vendor_support ppdev coretemp snd_hda_codec_realtek ath5k pcspkr snd_hda_codec_generic ath serio_raw mac80211 snd_hda_intel i2c_i801 snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss sr9700 cfg80211 snd_mixer_oss dm9601 snd_pcm evdev sg usbnet rfkill snd_timer lpc_ich mii snd mfd_core soundcore i915 rng_core acpi_cpufreq video drm_kms_helper drm 8250_fintek shpchp i2c_algo_bit parport_pc tpm_tis parport tpm button processor binfmt_misc w83627hf hwmon_vid autofs4 xfs libcrc32c ses enclosure crc32c_generic btrfs xor raid6_pq hid_generic uas usbhid hid usb_storage sr_mod cdrom sd_mod ata_generic ata_piix firewire_ohci e1000e libata ptp pps_core scsi_mod firewire_core crc_itu_t ehci_pci uhci_hcd ehci_hcd usbcore usb_common
[  269.225647] CPU: 0 PID: 1732 Comm: kworker/u4:8 Not tainted 4.4.0-1-686-pae #1 Debian 4.4.2-3
[  269.225647]  [<f869758b>] ? end_workqueue_fn+0x2b/0x40 [btrfs]
[  269.225647]  [<c1287080>] ? bio_endio+0x40/0x70
[  269.225647]  [<f86d9c7e>] ? btrfs_scrubparity_helper+0xce/0x270 [btrfs]
[  269.225647]  [<c107f9dd>] ? process_one_work+0x14d/0x360
[  269.225647]  [<c107fc29>] ? worker_thread+0x39/0x440
[  269.225647]  [<c107fbf0>] ? process_one_work+0x360/0x360
[  269.225647]  [<c1085026>] ? kthread+0xa6/0xc0
[  269.225647]  [<c153d949>] ? ret_from_kernel_thread+0x21/0x38
[  269.225647]  [<c1084f80>] ? kthread_create_on_node+0x130/0x130
[  269.225647] Code: 02 eb cf 8d 76 00 89 74 24 10 89 4c 24 0c 89 54 24 08 c7 44 24 04 e8 05 73 f8 c7 04 24 c0 05 74 f8 e8 88 b6 c1 c8 e9 46 ff ff ff <0f> 0b 90 55 89 e5 57 56 53 83 ec 08 3e 8d 74 26 00 89 ce 8b 0d
[  269.225647] EIP: [<f86c680d>] btrfs_check_repairable+0x12d/0x130 [btrfs] SS:ESP 0068:eab21e24
[  269.352509] ---[ end trace 4e1aa06ab00da653 ]---
[  269.357780] BUG: unable to handle kernel paging request at ffffffec
[  269.361219] IP: [<c10855bf>] kthread_data+0xf/0x20
[  269.361219] *pdpt = 0000000001817001 *pde = 00000000018a0067 *pte = 0000000000000000 
[  269.361219] Oops: 0000 [#2] SMP 
[  269.361219] Modules linked in: netconsole configfs option usb_wwan usbserial bridge stp llc arc4 iTCO_wdt iTCO_vendor_support ppdev coretemp snd_hda_codec_realtek ath5k pcspkr snd_hda_codec_generic ath serio_raw mac80211 snd_hda_intel i2c_i801 snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss sr9700 cfg80211 snd_mixer_oss dm9601 snd_pcm evdev sg usbnet rfkill snd_timer lpc_ich mii snd mfd_core soundcore i915 rng_core acpi_cpufreq video drm_kms_helper drm 8250_fintek shpchp i2c_algo_bit parport_pc tpm_tis parport tpm button processor binfmt_misc w83627hf hwmon_vid autofs4 xfs libcrc32c ses enclosure crc32c_generic btrfs xor raid6_pq hid_generic uas usbhid hid usb_storage sr_mod cdrom sd_mod ata_generic ata_piix firewire_ohci e1000e libata ptp pps_core scsi_mod firewire_core crc_itu_t ehci_pci uhci_hcd ehci_hcd usbcore usb_common
[  269.361219] CPU: 0 PID: 1732 Comm: kworker/u4:8 Tainted: G      D         4.4.0-1-686-pae #1 Debian 4.4.2-3
[  269.361219] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007
[  269.361219] task: f255b800 ti: eab20000 task.ti: eab20000
[  269.361219] EIP: 0060:[<c10855bf>] EFLAGS: 00010002 CPU: 0
[  269.361219] EIP is at kthread_data+0xf/0x20
[  269.361219] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
[  269.361219] ESI: 00000000 EDI: f255bacc EBP: eab21c80 ESP: eab21c78
[  269.361219]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  269.361219] CR0: 8005003b CR2: 00000014 CR3: 326fa540 CR4: 000006f0
[  269.361219] Stack:
[  269.361219]  c1080b40 c1810c40 f255b800 c1539cb9 f79d1740 f79d6d00 00000082 f79d1c40
[  269.361219]  f74e0d00 00000000 00000082 edb78580 edb7858c eab22000 eab21a9c f255b800
[  269.361219]  eab21cc4 c153a0ed eab21cfc eab21d14 c106c876 eab21de8 eab21cf0 c10ba6a7
[  269.361219] Call Trace:
[  269.361219]  [<c1080b40>] ? wq_worker_sleeping+0x10/0x90
[  269.361219]  [<c1539cb9>] ? __schedule+0x519/0x920
[  269.361219]  [<c153a0ed>] ? schedule+0x2d/0x80
[  269.361219]  [<c106c876>] ? do_exit+0x696/0x9e0
[  269.361219]  [<c10ba6a7>] ? vprintk_default+0x37/0x40
[  269.361219]  [<c1152fa1>] ? printk+0x17/0x19
[  269.361219]  [<c1015872>] ? oops_end+0x92/0xd0
[  269.361219]  [<c10132fa>] ? do_error_trap+0x8a/0x120
[  269.361219]  [<f86c680d>] ? btrfs_check_repairable+0x12d/0x130 [btrfs]
[  269.361219]  [<c1094f01>] ? sched_clock_cpu+0x81/0x130
[  269.361219]  [<c153b3d0>] ? mutex_lock+0x10/0x30
[  269.361219]  [<c10138c0>] ? do_overflow+0x30/0x30
[  269.361219]  [<c10138e4>] ? do_invalid_op+0x24/0x30
[  269.361219]  [<c153eac7>] ? error_code+0x67/0x6c
[  269.361219]  [<f86c680d>] ? btrfs_check_repairable+0x12d/0x130 [btrfs]
[  269.361219]  [<f86c6e1f>] ? end_bio_extent_readpage+0x53f/0x700 [btrfs]
[  269.361219]  [<f869758b>] ? end_workqueue_fn+0x2b/0x40 [btrfs]
[  269.361219]  [<c1287080>] ? bio_endio+0x40/0x70
[  269.361219]  [<f86d9c7e>] ? btrfs_scrubparity_helper+0xce/0x270 [btrfs]
[  269.361219]  [<c107f9dd>] ? process_one_work+0x14d/0x360
[  269.361219]  [<c107fc29>] ? worker_thread+0x39/0x440
[  269.361219]  [<c107fbf0>] ? process_one_work+0x360/0x360
[  269.361219]  [<c1085026>] ? kthread+0xa6/0xc0
[  269.361219]  [<c153d949>] ? ret_from_kernel_thread+0x21/0x38
[  269.361219]  [<c1084f80>] ? kthread_create_on_node+0x130/0x130
[  269.361219] Code: 00 b8 68 41 62 c1 e8 d1 45 fe ff e9 e7 fe ff ff 8d b6 00 00 00 00 8d bf 00 00 00 00 55 89 e5 3e 8d 74 26 00 8b 80 9c 02 00 00 5d <8b> 40 ec c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 83
[  269.361219] EIP: [<c10855bf>] kthread_data+0xf/0x20 SS:ESP 0068:eab21c78

[-- Attachment #3: crash2.log --]
[-- Type: text/plain, Size: 12854 bytes --]

[  581.072053] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  581.076097] ata3.00: BMDMA stat 0x25
[  581.080138] ata3.00: failed command: READ DMA
[  581.084150] ata3.00: cmd c8/00:30:40:78:2a/00:00:00:00:00/e3 tag 0 dma 24576 in
[  581.084150]          res 51/40:30:40:78:2a/00:00:00:00:00/e3 Emask 0x9 (media error)
[  581.090802] ata3.00: status: { DRDY ERR }
[  581.093301] ata3.00: error: { UNC }
[  581.108551] blk_update_request: I/O error, dev sda, sector 53114944
[  581.112522] BTRFS error (device sde2): bdev /dev/sda errs: wr 0, rd 33, flush 0, corrupt 0, gen 0
[  581.116753] ------------[ cut here ]------------
[  581.120548] kernel BUG at /build/linux-z6Er2E/linux-4.4.2/fs/btrfs/volumes.c:5508!
[  581.120548] invalid opcode: 0000 [#1] SMP 
[  581.120548] Modules linked in: netconsole configfs bridge stp llc option usb_wwan usbserial arc4 iTCO_wdt iTCO_vendor_support ath5k ppdev ath mac80211 coretemp pcspkr cfg80211 serio_raw snd_hda_codec_realtek snd_hda_codec_generic rfkill sr9700 snd_hda_intel snd_hda_codec dm9601 usbnet snd_hda_core mii snd_hwdep snd_pcm_oss snd_mixer_oss sg snd_pcm evdev lpc_ich i915 snd_timer mfd_core snd soundcore i2c_i801 video rng_core acpi_cpufreq drm_kms_helper drm 8250_fintek shpchp parport_pc[  581.134587] ------------[ cut here ]------------
[  581.134596] WARNING: CPU: 0 PID: 6 at /build/linux-z6Er2E/linux-4.4.2/net/core/netpoll.c:367 netpoll_send_skb_on_dev+0x1b1/0x230()
[  581.134608] netpoll_send_skb_on_dev(): eth0 enabled interrupts in poll (e1000_xmit_frame+0x0/0xd90 [e1000e])
[  581.134608] Modules linked in: netconsole configfs bridge stp llc option usb_wwan usbserial arc4 iTCO_wdt iTCO_vendor_support ath5k ppdev ath mac80211 coretemp pcspkr cfg80211 serio_raw snd_hda_codec_realtek snd_hda_codec_generic rfkill sr9700 snd_hda_intel snd_hda_codec dm9601 usbnet snd_hda_core mii snd_hwdep snd_pcm_oss snd_mixer_oss sg snd_pcm evdev lpc_ich i915 snd_timer mfd_core snd soundcore i2c_i801 video rng_core acpi_cpufreq drm_kms_helper drm 8250_fintek shpchp parport_pc i2c_algo_bit parport tpm_tis tpm button processor binfmt_misc w83627hf hwmon_vid autofs4 xfs libcrc32c ses enclosure crc32c_generic btrfs xor raid6_pq hid_generic uas usbhid usb_storage hid sd_mod sr_mod cdrom ata_generic ata_piix libata scsi_mod firewire_ohci firewire_core crc_itu_t e1000e ptp pps_core ehci_pci uhci_hcd ehci_hcd usbcore usb_common
[  581.134711] CPU: 0 PID: 6 Comm: kworker/u4:0 Not tainted 4.4.0-1-686-pae #1 Debian 4.4.2-3
[  581.134713] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007
[  581.134744] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
[  581.134747]  00000000 83955142 f750d8c0 c12b8fd5 f750d8d0 c1069a8d c1688318 f750d8f0
[  581.134755]  00000006 c16882e4 0000016f c14696e1 00000009 c14696e1 f387c000 0000004f
[  581.134765]  f38ecc00 f750d8dc c1069afe 00000009 f750d8d0 c1688318 f750d8f0 83955142
[  581.134772] Call Trace:
[  581.134780]  [<c12b8fd5>] ? dump_stack+0x3e/0x59
[  581.134785]  [<c1069a8d>] ? warn_slowpath_common+0x8d/0xc0
[  581.134788]  [<c14696e1>] ? netpoll_send_skb_on_dev+0x1b1/0x230
[  581.134791]  [<c14696e1>] ? netpoll_send_skb_on_dev+0x1b1/0x230
[  581.134794]  [<c1069afe>] ? warn_slowpath_fmt+0x3e/0x60
[  581.134797]  [<c14696e1>] ? netpoll_send_skb_on_dev+0x1b1/0x230
[  581.134805]  [<f83df5c0>] ? e1000e_update_tdt_wa.isra.53+0xb0/0xb0 [e1000e]
[  581.134814]  [<f8c7b1ce>] ? __br_deliver+0xfe/0x130 [bridge]
[  581.134822]  [<f8c7845c>] ? br_dev_xmit+0x1dc/0x230 [bridge]
[  581.134825]  [<c146916c>] ? netpoll_start_xmit+0x11c/0x1a0
[  581.134830]  [<c143b4d9>] ? __alloc_skb+0x69/0x1c0
[  581.134833]  [<c1469622>] ? netpoll_send_skb_on_dev+0xf2/0x230
[  581.134836]  [<c1469a11>] ? netpoll_send_udp+0x2b1/0x450
[  581.134840]  [<f8aa0b14>] ? write_msg+0xa4/0xe0 [netconsole]
[  581.134844]  [<f8aa0a70>] ? enabled_store+0x160/0x160 [netconsole]
[  581.134849]  [<c10b930b>] ? call_console_drivers.constprop.22+0xdb/0xe0
[  581.134853]  [<c10b9e9f>] ? console_unlock+0x52f/0x630
[  581.134856]  [<c10ba246>] ? vprintk_emit+0x2a6/0x580
[  581.134860]  [<c10ba6a7>] ? vprintk_default+0x37/0x40
[  581.134865]  [<c1152fa1>] ? printk+0x17/0x19
[  581.134872]  [<c10ea782>] ? print_modules+0x92/0xb0
[  581.134876]  [<c1015996>] ? __die+0x96/0x100
[  581.134878]  [<c1015d13>] ? die+0x33/0x60
[  581.134883]  [<c10132fa>] ? do_error_trap+0x8a/0x120
[  581.134906]  [<f86de706>] ? __btrfs_map_block+0x11a6/0x1580 [btrfs]
[  581.134930]  [<f86d57a4>] ? map_private_extent_buffer+0x54/0xd0 [btrfs]
[  581.134946]  [<f867fcde>] ? comp_keys+0x3e/0x60 [btrfs]
[  581.134963]  [<f867fd64>] ? generic_bin_search.constprop.36+0x64/0x170 [btrfs]
[  581.134967]  [<c10138c0>] ? do_overflow+0x30/0x30
[  581.134970]  [<c10138e4>] ? do_invalid_op+0x24/0x30
[  581.134974]  [<c153eac7>] ? error_code+0x67/0x6c
[  581.134979]  [<c153007b>] ? packet_sendmsg+0x53b/0x1150
[  581.135002]  [<f86de706>] ? __btrfs_map_block+0x11a6/0x1580 [btrfs]
[  581.135025]  [<f86c7e4b>] ? btrfs_get_token_32+0x10b/0x130 [btrfs]
[  581.135051]  [<f86df26e>] ? btrfs_map_bio+0x8e/0x370 [btrfs]
[  581.135075]  [<f86fe98a>] ? btrfs_submit_compressed_read+0x4ba/0x510 [btrfs]
[  581.135099]  [<f86ad323>] ? btrfs_submit_bio_hook+0x203/0x210 [btrfs]
[  581.135123]  [<f86d0edb>] ? end_bio_extent_readpage+0x5fb/0x700 [btrfs]
[  581.135144]  [<f86a158b>] ? end_workqueue_fn+0x2b/0x40 [btrfs]
[  581.135148]  [<c1287080>] ? bio_endio+0x40/0x70
[  581.135173]  [<f86e3c7e>] ? btrfs_scrubparity_helper+0xce/0x270 [btrfs]
[  581.135177]  [<c107f9dd>] ? process_one_work+0x14d/0x360
[  581.135180]  [<c107fc29>] ? worker_thread+0x39/0x440
[  581.135183]  [<c107fbf0>] ? process_one_work+0x360/0x360
[  581.135187]  [<c1085026>] ? kthread+0xa6/0xc0
[  581.135190]  [<c153d949>] ? ret_from_kernel_thread+0x21/0x38
[  581.135194]  [<c1084f80>] ? kthread_create_on_node+0x130/0x130
[  581.135196] ---[ end trace 3e2f94d2175a6a3e ]---

[  581.138407]  i2c_algo_bit parport tpm_tis tpm button processor binfmt_misc w83627hf hwmon_vid autofs4 xfs libcrc32c ses enclosure crc32c_generic btrfs xor raid6_pq hid_generic uas usbhid usb_storage hid sd_mod sr_mod cdrom ata_generic ata_piix libata scsi_mod firewire_ohci firewire_core crc_itu_t e1000e ptp pps_core ehci_pci uhci_hcd ehci_hcd usbcore usb_common
[  581.138407] CPU: 0 PID: 6 Comm: kworker/u4:0 Tainted: G        W       4.4.0-1-686-pae #1 Debian 4.4.2-3
[  581.138407] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007
[  581.138407] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
[  581.138407] task: f74bf040 ti: f750c000 task.ti: f750c000
[  581.138407] EIP: 0060:[<f86de706>] EFLAGS: 00010217 CPU: 0
[  581.138407] EIP is at __btrfs_map_block+0x11a6/0x1580 [btrfs]
[  581.138407] EAX: 00001460 EBX: 00000002 ECX: 00000011 EDX: 00000000
[  581.138407] ESI: ffff0000 EDI: f3a3ffff EBP: f750dd34 ESP: f750dc60
[  581.138407]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  581.138407] CR0: 8005003b CR2: b6d54030 CR3: 37190a80 CR4: 000006f0
[  581.138407] Stack:
[  581.138407]  00010000 00000000 f86c7e4b f750dc84 f750dc88 00000000 00000000 000006ba
[  581.138407]  00008000 00000000 00000000 00000000 00000000 00000000 f6926920 00000018
[  581.138407]  00000000 f750dcf8 00004000 00010000 00000011 00000000 00000000 f750dd88
[  581.138407] Call Trace:
[  581.138407]  [<f86c7e4b>] ? btrfs_get_token_32+0x10b/0x130 [btrfs]
[  581.138407]  [<f86df26e>] ? btrfs_map_bio+0x8e/0x370 [btrfs]
[  581.138407]  [<f86fe98a>] ? btrfs_submit_compressed_read+0x4ba/0x510 [btrfs]
[  581.138407]  [<f86ad323>] ? btrfs_submit_bio_hook+0x203/0x210 [btrfs]
[  581.138407]  [<f86d0edb>] ? end_bio_extent_readpage+0x5fb/0x700 [btrfs]
[  581.138407]  [<f86a158b>] ? end_workqueue_fn+0x2b/0x40 [btrfs]
[  581.138407]  [<c1287080>] ? bio_endio+0x40/0x70
[  581.138407]  [<f86e3c7e>] ? btrfs_scrubparity_helper+0xce/0x270 [btrfs]
[  581.138407]  [<c107f9dd>] ? process_one_work+0x14d/0x360
[  581.522462]  [<c107fc29>] ? worker_thread+0x39/0x440
[  581.525520]  [<c107fbf0>] ? process_one_work+0x360/0x360
[  581.526271]  [<c1085026>] ? kthread+0xa6/0xc0
[  581.526271]  [<c153d949>] ? ret_from_kernel_thread+0x21/0x38
[  581.526271]  [<c1084f80>] ? kthread_create_on_node+0x130/0x130
[  581.526271] Code: 58 10 8b 5d 8c 89 48 0c 8b 93 7c 09 00 00 89 10 8b 55 c4 8b 46 38 89 3c 90 89 f8 bf 01 00 00 00 83 c0 01 89 45 a0 e9 4c f8 ff ff <0f> 0b c7 45 d4 01 00 00 00 e9 0c fa ff ff ba 69 16 00 00 b8 fc
[  581.526271] EIP: [<f86de706>] __btrfs_map_block+0x11a6/0x1580 [btrfs] SS:ESP 0068:f750dc60
[  581.545030] ---[ end trace 3e2f94d2175a6a3f ]---
[  581.548486] BUG: unable to handle kernel paging request at ffffffec
[  581.551854] IP: [<c10855bf>] kthread_data+0xf/0x20
[  581.552370] *pdpt = 0000000001817001 *pde = 00000000018a0067 *pte = 0000000000000000 
[  581.552370] Oops: 0000 [#2] SMP 
[  581.552370] Modules linked in: netconsole configfs bridge stp llc option usb_wwan usbserial arc4 iTCO_wdt iTCO_vendor_support ath5k ppdev ath mac80211 coretemp pcspkr cfg80211 serio_raw snd_hda_codec_realtek snd_hda_codec_generic rfkill sr9700 snd_hda_intel snd_hda_codec dm9601 usbnet snd_hda_core mii snd_hwdep snd_pcm_oss snd_mixer_oss sg snd_pcm evdev lpc_ich i915 snd_timer mfd_core snd soundcore i2c_i801 video rng_core acpi_cpufreq drm_kms_helper drm 8250_fintek shpchp parport_pc i2c_algo_bit parport tpm_tis tpm button processor binfmt_misc w83627hf hwmon_vid autofs4 xfs libcrc32c ses enclosure crc32c_generic btrfs xor raid6_pq hid_generic uas usbhid usb_storage hid sd_mod sr_mod cdrom ata_generic ata_piix libata scsi_mod firewire_ohci firewire_core crc_itu_t e1000e ptp pps_core ehci_pci uhci_hcd ehci_hcd usbcore usb_common
[  581.552370] CPU: 0 PID: 6 Comm: kworker/u4:0 Tainted: G      D W       4.4.0-1-686-pae #1 Debian 4.4.2-3
[  581.552370] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007
[  581.552370] task: f74bf040 ti: f750c000 task.ti: f750c000
[  581.552370] EIP: 0060:[<c10855bf>] EFLAGS: 00010002 CPU: 0
[  581.552370] EIP is at kthread_data+0xf/0x20
[  581.552370] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
[  581.552370] ESI: 00000000 EDI: f74bf30c EBP: f750dabc ESP: f750dab4
[  581.552370]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  581.552370] CR0: 8005003b CR2: 00000014 CR3: 37190a80 CR4: 000006f0
[  581.552370] Stack:
[  581.552370]  c1080b40 c1810c40 f74bf040 c1539cb9 f79d1740 f79d6d00 00000082 f79d1c40
[  581.552370]  f74e0d00 00000000 00000082 f3a8b180 f3a8b18c f750e000 f750d8d8 f74bf040
[  581.552370]  f750db00 c153a0ed f750db38 f750db50 c106c876 f750dc24 f750db2c c10ba6a7
[  581.552370] Call Trace:
[  581.552370]  [<c1080b40>] ? wq_worker_sleeping+0x10/0x90
[  581.552370]  [<c1539cb9>] ? __schedule+0x519/0x920
[  581.552370]  [<c153a0ed>] ? schedule+0x2d/0x80
[  581.552370]  [<c106c876>] ? do_exit+0x696/0x9e0
[  581.552370]  [<c10ba6a7>] ? vprintk_default+0x37/0x40
[  581.552370]  [<c1152fa1>] ? printk+0x17/0x19
[  581.552370]  [<c1015872>] ? oops_end+0x92/0xd0
[  581.552370]  [<c10132fa>] ? do_error_trap+0x8a/0x120
[  581.552370]  [<f86de706>] ? __btrfs_map_block+0x11a6/0x1580 [btrfs]
[  581.552370]  [<f86d57a4>] ? map_private_extent_buffer+0x54/0xd0 [btrfs]
[  581.552370]  [<f867fcde>] ? comp_keys+0x3e/0x60 [btrfs]
[  581.552370]  [<f867fd64>] ? generic_bin_search.constprop.36+0x64/0x170 [btrfs]
[  581.552370]  [<c10138c0>] ? do_overflow+0x30/0x30
[  581.552370]  [<c10138e4>] ? do_invalid_op+0x24/0x30
[  581.552370]  [<c153eac7>] ? error_code+0x67/0x6c
[  581.552370]  [<c153007b>] ? packet_sendmsg+0x53b/0x1150
[  581.552370]  [<f86de706>] ? __btrfs_map_block+0x11a6/0x1580 [btrfs]
[  581.552370]  [<f86c7e4b>] ? btrfs_get_token_32+0x10b/0x130 [btrfs]
[  581.552370]  [<f86df26e>] ? btrfs_map_bio+0x8e/0x370 [btrfs]
[  581.552370]  [<f86fe98a>] ? btrfs_submit_compressed_read+0x4ba/0x510 [btrfs]
[  581.552370]  [<f86ad323>] ? btrfs_submit_bio_hook+0x203/0x210 [btrfs]
[  581.552370]  [<f86d0edb>] ? end_bio_extent_readpage+0x5fb/0x700 [btrfs]
[  581.552370]  [<f86a158b>] ? end_workqueue_fn+0x2b/0x40 [btrfs]
[  581.552370]  [<c1287080>] ? bio_endio+0x40/0x70
[  581.552370]  [<f86e3c7e>] ? btrfs_scrubparity_helper+0xce/0x270 [btrfs]
[  581.552370]  [<c107f9dd>] ? process_one_work+0x14d/0x360
[  581.552370]  [<c107fc29>] ? worker_thread+0x39/0x440
[  581.552370]  [<c107fbf0>] ? process_one_work+0x360/0x360
[  581.552370]  [<c1085026>] ? kthread+0xa6/0xc0
[  581.552370]  [<c153d949>] ? ret_from_kernel_thread+0x21/0x38
[  581.552370]  [<c1084f80>] ? kthread_create_on_node+0x130/0x130
[  581.552370] Code: 00 b8 68 41 62 c1 e8 d1 45 fe ff e9 e7 fe ff ff 8d b6 00 00 00 00 8d bf 00 00 00 00 55 89 e5 3e 8d 74 26 00 8b 80 9c 02 00 00 5d <8b> 40 ec c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash if both devices in raid1 are failing
  2016-04-14 20:30 Kernel crash if both devices in raid1 are failing Dmitry Katsubo
@ 2016-04-17 23:18 ` Dmitry Katsubo
  2016-04-21  3:45   ` Liu Bo
  2016-04-18  0:19 ` Chris Murphy
  1 sibling, 1 reply; 9+ messages in thread
From: Dmitry Katsubo @ 2016-04-17 23:18 UTC (permalink / raw)
  To: linux-btrfs

On 2016-04-14 22:30, Dmitry Katsubo wrote:
> Dear btrfs community,
> 
> I have the following setup:
> 
> # btrfs fi show /home
> Label: none  uuid: 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
> 	Total devices 3 FS bytes used 55.68GiB
> 	devid    1 size 52.91GiB used 0.00B path /dev/sdd2
> 	devid    2 size 232.89GiB used 59.03GiB path /dev/sda
> 	devid    3 size 111.79GiB used 59.03GiB path /dev/sdc1
> 
> btrfs volume was created in raid1 mode both for data and metadata and mounted
> with compress=lzo option.
> 
> Unfortunately, two drives (sda and sdc1) started to fail at the same time. This
> leads to system crash if I start the system in runlevel 3 (see crash1.log).
> 
> After I have started the system in single mode, volume can be mounted in rw
> mode and I can write some data into it. Unfortunately when I tried to read
> a certain file, the system crashed (see crash2.log).
> 
> I have started scrub on the volume and here is the report:
> 
> # btrfs scrub status /home
> scrub status for 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
> 	scrub started at Tue Apr 12 20:39:20 2016 and finished after 02:40:09
> 	total bytes scrubbed: 55.68GiB with 1767 errors
> 	error details: verify=175 csum=1592
> 	corrected errors: 1110, uncorrectable errors: 657, unverified errors: 0
> 
> Obviously, some data is lost. However due to above crash, I cannot just copy
> the data from the volume. I would assume that I still can access the data, but
> the files for which data is lost, should result I/O error (I would then recover
> them from my backup).
> 
> I have decided to attach another drive and remove failing devices one-by-one.
> However that does not work:
> 
> # btrfs dev delete /dev/sda /home
> [  168.680057] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [  168.684236] ata3.00: BMDMA stat 0x25
> [  168.688464] ata3.00: failed command: READ DMA
> [  168.692681] ata3.00: cmd c8/00:08:68:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 in
> [  168.692681]          res 51/40:08:68:4b:84/40:08:07:00:00/e7 Emask 0x9 (media error)
> [  168.701281] ata3.00: status: { DRDY ERR }
> [  168.705600] ata3.00: error: { UNC }
> [  168.724446] blk_update_request: I/O error, dev sda, sector 126110568
> [  168.728860] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 43, flush 0, corrupt 0, gen 0
> [  172.824043] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [  172.828651] ata3.00: BMDMA stat 0x25
> [  172.833281] ata3.00: failed command: READ DMA
> [  172.837876] ata3.00: cmd c8/00:08:50:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 in
> [  172.837876]          res 51/40:08:50:4b:84/40:08:07:00:00/e7 Emask 0x9 (media error)
> [  172.847296] ata3.00: status: { DRDY ERR }
> [  172.852054] ata3.00: error: { UNC }
> [  172.872404] blk_update_request: I/O error, dev sda, sector 126110544
> [  172.877241] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 44, flush 0, corrupt 0, gen 0
> ERROR: error removing device '/dev/sda': Input/output error
> 
> The same happens when I try to delete /dev/sdc1 from the volume. Is there any
> btrfs "force" option so that btrfs balances only chunks that are accessible? I
> can potentially physically disconnect /dev/sda, but the loss will be greater
> I believe.
> 
> How can I proceed except btrfs restore?
> 
> During scrub operation the following was recorded in the logs:
> 
> [Tue Apr 12 23:10:20 2016] BTRFS warning (device sdc1): checksum error at logical 126952947712 on dev /dev/sdc1, sector 126150176, root 258, inode 879324, offset 308256768, length 4096, links 1 (path: lib/mysql/ibdata1)
> 
> If I collect all the messages like this, will it give a full picture of damaged files?
> 
> Many thanks in advance.
> 
> P.S. Linux kernel v4.4.2, btrfs-progs v4.4.

I have decided to try "btrfs restore". Actually I have discovered two usability
points about it:

1. I cannot run this utility as following:

btrfs -i restore /dev/sda /mnt/usb &> log

because this command is interactive and may read something from the terminal.
It would be nice if there is a flag -y (answer "yes" to all questions) so that
no input is required from user. The example of the question is:

We seem to be looping a lot on ..., do you want to keep going on? [y/N/a]

In general this question puzzles me. What does it mean? As far as I understood
it prevents btrfs restore from looping forever. Should I consider those files
as lost? I have also hit the same problem as discussed in [1]: answer
"a" (always) still causes the questions to be asked.

2. btrfs restore does not print a final statistics: how many files are
successfully restored, and how many have failed.

[1] https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36458.html

-- 
With best regards,
Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash if both devices in raid1 are failing
  2016-04-14 20:30 Kernel crash if both devices in raid1 are failing Dmitry Katsubo
  2016-04-17 23:18 ` Dmitry Katsubo
@ 2016-04-18  0:19 ` Chris Murphy
  2016-04-19  5:45   ` Dmitry Katsubo
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2016-04-18  0:19 UTC (permalink / raw)
  To: Dmitry Katsubo; +Cc: linux-btrfs

On Thu, Apr 14, 2016 at 2:30 PM, Dmitry Katsubo
<dmitry.katsubo@gmail.com> wrote:
> Dear btrfs community,
>
> I have the following setup:
>
> # btrfs fi show /home
> Label: none  uuid: 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
>         Total devices 3 FS bytes used 55.68GiB
>         devid    1 size 52.91GiB used 0.00B path /dev/sdd2
>         devid    2 size 232.89GiB used 59.03GiB path /dev/sda
>         devid    3 size 111.79GiB used 59.03GiB path /dev/sdc1
>
> btrfs volume was created in raid1 mode both for data and metadata and mounted
> with compress=lzo option.
>
> Unfortunately, two drives (sda and sdc1) started to fail at the same time. This
> leads to system crash if I start the system in runlevel 3 (see crash1.log).
>
> After I have started the system in single mode, volume can be mounted in rw
> mode and I can write some data into it. Unfortunately when I tried to read
> a certain file, the system crashed (see crash2.log).
>
> I have started scrub on the volume and here is the report:
>
> # btrfs scrub status /home
> scrub status for 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
>         scrub started at Tue Apr 12 20:39:20 2016 and finished after 02:40:09
>         total bytes scrubbed: 55.68GiB with 1767 errors
>         error details: verify=175 csum=1592
>         corrected errors: 1110, uncorrectable errors: 657, unverified errors: 0
>
> Obviously, some data is lost. However due to above crash, I cannot just copy
> the data from the volume. I would assume that I still can access the data, but
> the files for which data is lost, should result I/O error (I would then recover
> them from my backup).

With two device failure on raid1 volume, the file system is actually
broken. There's a big hole in the metadata, not just missing data,
because there are only two copies of metadata, distributed across
three drives.

btrfs restore might be able to scrape off some files, but I don't
expect it'll get very far. If there were n-way raid1, where every
drive has a complete copy of 100% of the filesystem metadata, what you
suggest would be possible.



>
> I have decided to attach another drive and remove failing devices one-by-one.
> However that does not work:
>
> # btrfs dev delete /dev/sda /home
> [  168.680057] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [  168.684236] ata3.00: BMDMA stat 0x25
> [  168.688464] ata3.00: failed command: READ DMA
> [  168.692681] ata3.00: cmd c8/00:08:68:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 in
> [  168.692681]          res 51/40:08:68:4b:84/40:08:07:00:00/e7 Emask 0x9 (media error)
> [  168.701281] ata3.00: status: { DRDY ERR }
> [  168.705600] ata3.00: error: { UNC }
> [  168.724446] blk_update_request: I/O error, dev sda, sector 126110568
> [  168.728860] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 43, flush 0, corrupt 0, gen 0
> [  172.824043] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [  172.828651] ata3.00: BMDMA stat 0x25
> [  172.833281] ata3.00: failed command: READ DMA
> [  172.837876] ata3.00: cmd c8/00:08:50:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 in
> [  172.837876]          res 51/40:08:50:4b:84/40:08:07:00:00/e7 Emask 0x9 (media error)
> [  172.847296] ata3.00: status: { DRDY ERR }
> [  172.852054] ata3.00: error: { UNC }
> [  172.872404] blk_update_request: I/O error, dev sda, sector 126110544
> [  172.877241] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 44, flush 0, corrupt 0, gen 0
> ERROR: error removing device '/dev/sda': Input/output error



>
> The same happens when I try to delete /dev/sdc1 from the volume. Is there any
> btrfs "force" option so that btrfs balances only chunks that are accessible? I
> can potentially physically disconnect /dev/sda, but the loss will be greater
> I believe.

OK probably the worst thing you can do if you're trying to recover
data from a degraded volume where a 2nd device is also having
problems, is to mount it rw let alone write anything to it. *shrug*
That's just going to make things much worse and more difficult to
recover, assuming anything can be recovered at all. The least number
of changes you make to such a volume, the better.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash if both devices in raid1 are failing
  2016-04-18  0:19 ` Chris Murphy
@ 2016-04-19  5:45   ` Dmitry Katsubo
  2016-04-19  7:58     ` Duncan
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Katsubo @ 2016-04-19  5:45 UTC (permalink / raw)
  To: linux-btrfs

On 2016-04-18 02:19, Chris Murphy wrote:
> With two device failure on raid1 volume, the file system is actually
> broken. There's a big hole in the metadata, not just missing data,
> because there are only two copies of metadata, distributed across
> three drives.

Thanks, I understand that. Well, the drive has not completely failed,
it has accidental read-write errors. I still wonder what went wrong
and why the kernel has crashed - I think this should not happen, as it
does not allow me to operate with the data which still can be read.
I am happy to contribute more information if it would help.

> btrfs restore might be able to scrape off some files, but I don't
> expect it'll get very far. If there were n-way raid1, where every
> drive has a complete copy of 100% of the filesystem metadata, what you
> suggest would be possible.

Actually btrfs restore has recovered many files, however I was not
able to run in fully unattended mode as it complains about "looping a lot".
Does it mean that files are corrupted / not correctly restored?

> OK probably the worst thing you can do if you're trying to recover
> data from a degraded volume where a 2nd device is also having
> problems, is to mount it rw let alone write anything to it. *shrug*
> That's just going to make things much worse and more difficult to
> recover, assuming anything can be recovered at all. The least number
> of changes you make to such a volume, the better.

Another option I have thought about is to shrink the failing volume
up to some small value. This will cause chunks to be moved to another
location. How btrfs will behave if both copies cannot be read?
Would be nice to have a strategy to recover without "btrfs restore"
in such case. I wonder because "btrfs restore" assumes pausing of
normal system operation to do copying back and forth.

-- 
With best regards,
Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash if both devices in raid1 are failing
  2016-04-19  5:45   ` Dmitry Katsubo
@ 2016-04-19  7:58     ` Duncan
  2016-04-20 22:02       ` Dmitry Katsubo
  0 siblings, 1 reply; 9+ messages in thread
From: Duncan @ 2016-04-19  7:58 UTC (permalink / raw)
  To: linux-btrfs

Dmitry Katsubo posted on Tue, 19 Apr 2016 07:45:40 +0200 as excerpted:

> Actually btrfs restore has recovered many files, however I was not able
> to run in fully unattended mode as it complains about "looping a lot".
> Does it mean that files are corrupted / not correctly restored?

As long as you tell it to keep going each time, the loop complaints 
shouldn't be an issue.  The problem is that the loop counter is measuring 
loops on a particular directory, because that's what it has available to 
measure.  But if you had a whole bunch of files in that dir, it's /going/ 
to loop a lot, to restore all of them.

I have one cache directory with over 200K files in it.  They're all text 
messages from various technical lists and newsgroups (like this list, 
which I view as a newsgroup using gmane.org's list2news service) so 
they're quite small, about 5 KiB on average by my quick calculation, but 
that's still a LOT of files for a single dir, even if they're only using 
just over a GiB of space.

I ended up doing a btrfs restore on that filesystem (/home), because 
while I had a backup, restore was getting more recent copies of stuff 
back, and that dir looped a *LOT* the first time it happened, now several 
years ago, before they actually added the always option.

The second time it happened, about a year ago, restore worked much 
better, and I was able to use the always option.  But AFAIK, always only 
applies to that dir.  If you have multiple dirs with the problem, you'll 
still get asked for the next one.  But it did vastly improve the 
situation for me, giving me only a handful of prompts instead of the very 
many I had before the option was there.

(The main problem triggering the need to run restore for me, turned out 
to be hardware.  I've had no issues since I replaced that failing ssd, 
and with a bit of luck, won't be running restore again for a few years, 
now.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash if both devices in raid1 are failing
  2016-04-19  7:58     ` Duncan
@ 2016-04-20 22:02       ` Dmitry Katsubo
  0 siblings, 0 replies; 9+ messages in thread
From: Dmitry Katsubo @ 2016-04-20 22:02 UTC (permalink / raw)
  To: linux-btrfs

On 2016-04-19 09:58, Duncan wrote:
> Dmitry Katsubo posted on Tue, 19 Apr 2016 07:45:40 +0200 as excerpted:
> 
>> Actually btrfs restore has recovered many files, however I was not able
>> to run in fully unattended mode as it complains about "looping a lot".
>> Does it mean that files are corrupted / not correctly restored?
> 
> As long as you tell it to keep going each time, the loop complaints 
> shouldn't be an issue.  The problem is that the loop counter is measuring 
> loops on a particular directory, because that's what it has available to 
> measure.  But if you had a whole bunch of files in that dir, it's /going/ 
> to loop a lot, to restore all of them.
> 
> I have one cache directory with over 200K files in it.  They're all text 
> messages from various technical lists and newsgroups (like this list, 
> which I view as a newsgroup using gmane.org's list2news service) so 
> they're quite small, about 5 KiB on average by my quick calculation, but 
> that's still a LOT of files for a single dir, even if they're only using 
> just over a GiB of space.
> 
> I ended up doing a btrfs restore on that filesystem (/home), because 
> while I had a backup, restore was getting more recent copies of stuff 
> back, and that dir looped a *LOT* the first time it happened, now several 
> years ago, before they actually added the always option.

I have the same situation here: there is a backup, but the most recent
modifications in files are preferable.

> The second time it happened, about a year ago, restore worked much 
> better, and I was able to use the always option.  But AFAIK, always only 
> applies to that dir.  If you have multiple dirs with the problem, you'll 
> still get asked for the next one.  But it did vastly improve the 
> situation for me, giving me only a handful of prompts instead of the very 
> many I had before the option was there.

Yes, this is exactly the problem discussed a while ago. Would be nice if
"btrfs restore -i" applies "(a)lways" option to all questions or there is
a separate option for that ("-y").

For me personally "looping" is too low-level problem. System administrators
(that are going to use this utility) should operate with some more reasonable
terms. If "looping" is some analogy of "time consumption" then I would say
that during restore time does not matter so much: I am ready to wait for 1
minute until a specific file is restored. So I think not the number of loops
but number of time spent should be measured.

Also I have difficulties in finding out what files have not been restored
due to uncorrectable errors. As I cannot redirect the output of
"btrfs restore" and it does not print the final stats, I cannot tell what
files have to be restored from backup.

> (The main problem triggering the need to run restore for me, turned out 
> to be hardware.  I've had no issues since I replaced that failing ssd, 
> and with a bit of luck, won't be running restore again for a few years, 
> now.)

I would be happy if I am able to replace the failing drive on the fly, without
stopping the system. Unfortunately I cannot do that due to kernel crashes :(
btrfs is still not resistant to these corner cases.

-- 
With best regards,
Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash if both devices in raid1 are failing
  2016-04-17 23:18 ` Dmitry Katsubo
@ 2016-04-21  3:45   ` Liu Bo
       [not found]     ` <571DC34A.50509@gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Liu Bo @ 2016-04-21  3:45 UTC (permalink / raw)
  To: Dmitry Katsubo; +Cc: linux-btrfs

On Mon, Apr 18, 2016 at 01:18:31AM +0200, Dmitry Katsubo wrote:
> On 2016-04-14 22:30, Dmitry Katsubo wrote:
> > Dear btrfs community,
> > 
> > I have the following setup:
> > 
> > # btrfs fi show /home
> > Label: none  uuid: 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
> > 	Total devices 3 FS bytes used 55.68GiB
> > 	devid    1 size 52.91GiB used 0.00B path /dev/sdd2
> > 	devid    2 size 232.89GiB used 59.03GiB path /dev/sda
> > 	devid    3 size 111.79GiB used 59.03GiB path /dev/sdc1
> > 
> > btrfs volume was created in raid1 mode both for data and metadata and mounted
> > with compress=lzo option.
> > 
> > Unfortunately, two drives (sda and sdc1) started to fail at the same time. This
> > leads to system crash if I start the system in runlevel 3 (see crash1.log).
> > 
> > After I have started the system in single mode, volume can be mounted in rw
> > mode and I can write some data into it. Unfortunately when I tried to read
> > a certain file, the system crashed (see crash2.log).
> > 
> > I have started scrub on the volume and here is the report:
> > 
> > # btrfs scrub status /home
> > scrub status for 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
> > 	scrub started at Tue Apr 12 20:39:20 2016 and finished after 02:40:09
> > 	total bytes scrubbed: 55.68GiB with 1767 errors
> > 	error details: verify=175 csum=1592
> > 	corrected errors: 1110, uncorrectable errors: 657, unverified errors: 0
> > 
> > Obviously, some data is lost. However due to above crash, I cannot just copy
> > the data from the volume. I would assume that I still can access the data, but
> > the files for which data is lost, should result I/O error (I would then recover
> > them from my backup).
> > 
> > I have decided to attach another drive and remove failing devices one-by-one.
> > However that does not work:
> > 
> > # btrfs dev delete /dev/sda /home
> > [  168.680057] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> > [  168.684236] ata3.00: BMDMA stat 0x25
> > [  168.688464] ata3.00: failed command: READ DMA
> > [  168.692681] ata3.00: cmd c8/00:08:68:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 in
> > [  168.692681]          res 51/40:08:68:4b:84/40:08:07:00:00/e7 Emask 0x9 (media error)
> > [  168.701281] ata3.00: status: { DRDY ERR }
> > [  168.705600] ata3.00: error: { UNC }
> > [  168.724446] blk_update_request: I/O error, dev sda, sector 126110568
> > [  168.728860] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 43, flush 0, corrupt 0, gen 0
> > [  172.824043] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> > [  172.828651] ata3.00: BMDMA stat 0x25
> > [  172.833281] ata3.00: failed command: READ DMA
> > [  172.837876] ata3.00: cmd c8/00:08:50:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 in
> > [  172.837876]          res 51/40:08:50:4b:84/40:08:07:00:00/e7 Emask 0x9 (media error)
> > [  172.847296] ata3.00: status: { DRDY ERR }
> > [  172.852054] ata3.00: error: { UNC }
> > [  172.872404] blk_update_request: I/O error, dev sda, sector 126110544
> > [  172.877241] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 44, flush 0, corrupt 0, gen 0
> > ERROR: error removing device '/dev/sda': Input/output error
> > 
> > The same happens when I try to delete /dev/sdc1 from the volume. Is there any
> > btrfs "force" option so that btrfs balances only chunks that are accessible? I
> > can potentially physically disconnect /dev/sda, but the loss will be greater
> > I believe.
> > 
> > How can I proceed except btrfs restore?
> > 
> > During scrub operation the following was recorded in the logs:
> > 
> > [Tue Apr 12 23:10:20 2016] BTRFS warning (device sdc1): checksum error at logical 126952947712 on dev /dev/sdc1, sector 126150176, root 258, inode 879324, offset 308256768, length 4096, links 1 (path: lib/mysql/ibdata1)
> > 
> > If I collect all the messages like this, will it give a full picture of damaged files?
> > 
> > Many thanks in advance.
> > 
> > P.S. Linux kernel v4.4.2, btrfs-progs v4.4.
> 
> I have decided to try "btrfs restore". Actually I have discovered two usability
> points about it:
> 
> 1. I cannot run this utility as following:
> 
> btrfs -i restore /dev/sda /mnt/usb &> log
> 
> because this command is interactive and may read something from the terminal.
> It would be nice if there is a flag -y (answer "yes" to all questions) so that
> no input is required from user. The example of the question is:
> 
> We seem to be looping a lot on ..., do you want to keep going on? [y/N/a]
> 
> In general this question puzzles me. What does it mean? As far as I understood
> it prevents btrfs restore from looping forever. Should I consider those files
> as lost? I have also hit the same problem as discussed in [1]: answer
> "a" (always) still causes the questions to be asked.
> 
> 2. btrfs restore does not print a final statistics: how many files are
> successfully restored, and how many have failed.

Thanks for trying 'restore', but I was wondering, does btrfsck work for you?

Thanks,

-liubo

> 
> [1] https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36458.html
> 
> -- 
> With best regards,
> Dmitry
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash if both devices in raid1 are failing
       [not found]     ` <571DC34A.50509@gmail.com>
@ 2016-04-27  2:44       ` Dmitry Katsubo
  2016-05-02 20:51         ` Dmitry Katsubo
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Katsubo @ 2016-04-27  2:44 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 5331 bytes --]

On 2016-04-25 09:12, Dmitry Katsubo wrote:
> I have run "btrfs check /dev/sda" two times. One time it has completed
> OK, actually showing only one error. The 2nd time it has shown many messages
> 
> "parent transid verify failed on NNN wanted AAA found BBB"
> 
> and then asserted :) But I think the 2nd run is not representative as I have
> gracefully removed one drive from btrfs array to build a new array. The
> "btrfs device remove" completed successfully, but it might have written some
> metadata to the remaining drives, which perhaps was not synchronized
> correctly.
> 
> What I am going to do next is to recompile btrfs-tools so that "-i" CLI option
> applies "(y)" to all questions and run "btrfs restore" again. Hopefully it can
> handle transid mismatch correctly...

OK, I have recompiled btrfs with necessary fix (attached). It allowed me to capture
"btrfs restore" output because due to reads from console it was not possible, even
with attempts like this:

while true; do echo y; done | btrfs restore -voxmSi /dev/sda /mnt/backup 2>&1 | tee btrfs_restore

For the matter of experiment I have upgraded kernel to 4.4.6 and it still crashes
on problematic file:

# cat /mnt/tmp/file > /dev/null
[   11.432059] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[   11.436665] ata3.00: BMDMA stat 0x25
[   11.441301] ata3.00: failed command: READ DMA
[   11.479570] ata3.00: cmd c8/00:20:40:ec:f3/00:00:00:00:00/e3 tag 0 dma 16384 in
[   11.479664]          res 51/40:1e:42:ec:f3/00:00:00:00:00/e3 Emask 0x9 (media error)
[   11.619086] ata3.00: status: { DRDY ERR }
[   11.619126] ata3.00: error: { UNC }
[   11.625750] blk_update_request: I/O error, dev sda, sector 66317378
[   11.625779] NOHZ: local_softirq_pending 40
[   70.969876] ------------[ cut here ]------------
[   70.969879] kernel BUG at /build/linux-SBJFwR/linux-4.4.6/debian/build/source_rt/fs/btrfs/volumes.c:5509!
[   70.969885] invalid opcode: 0000 [#1] PREEMPT SMP 
[   70.969954] Modules linked in: netconsole configfs bridge stp llc arc4 iTCO_wdt iTCO_vendor_support ppdev coretemp pcspkr serio_raw i2c_i801 ath5k ath mac80211 cfg80211 sr9700 evdev rfkill dm9601 usbnet lpc_ich mfd_core mii option usb_wwan usbserial rng_core sg snd_hda_codec_realtek snd_hda_codec_generic i915 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm acpi_cpufreq snd_timer video 8250_fintek snd drm_kms_helper soundcore tpm_tis drm tpm parport_pc i2c_algo_bit parport shpchp button processor binfmt_misc w83627hf hwmon_vid autofs4 xfs libcrc32c hid_generic usbhid hid crc32c_generic btrfs xor raid6_pq uas usb_storage sd_mod sr_mod cdrom ata_generic firewire_ohci ata_piix libata scsi_mod firewire_core crc_itu_t ehci_pci uhci_hcd ehci_hcd usbcore usb_common e1000e ptp pps_core
[   70.969965] CPU: 0 PID: 114 Comm: kworker/u4:3 Tainted: G        W       4.4.0-1-rt-686-pae #1 Debian 4.4.6-1
[   70.969968] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007
[   70.970029] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
[   70.970032] task: f3eec0c0 ti: f6a76000 task.ti: f6a76000
[   70.970036] EIP: 0060:[<f87506be>] EFLAGS: 00010217 CPU: 0
[   70.970076] EIP is at __btrfs_map_block+0x11be/0x15a0 [btrfs]

Unfortunately I was not able to capture the whole trace, as there seem to be
concurrent problem with netconsole: the whole system hangs at the point above.

P.S. If debian maintainer of btrfs-progs is on the list: Project packaging fails
for me (happens at the very end during binaries installation):

# debuild
...
dpkg-source: info: building btrfs-progs in btrfs-progs_4.4.1-1.1.debian.tar.xz
dpkg-source: info: building btrfs-progs in btrfs-progs_4.4.1-1.1.dsc
 debian/rules build
dh build --parallel
   dh_testdir -O--parallel
   debian/rules override_dh_auto_configure
make[1]: Entering directory '/home/btrfs-progs-4.4.1'
dh_auto_configure -- --bindir=/bin
make[1]: Leaving directory '/home/btrfs-progs-4.4.1'
   dh_auto_build -O--parallel
 fakeroot debian/rules binary
dh binary --parallel
   dh_testroot -O--parallel
   dh_prep -O--parallel
   debian/rules override_dh_auto_install
make[1]: Entering directory '/home/btrfs-progs-4.4.1'
dh_auto_install --destdir=debian/btrfs-progs
# Adding initramfs-tools integration
install -D -m 0755 debian/local/btrfs.hook debian/btrfs-progs/usr/share/initramfs-tools/hooks/btrfs
install -D -m 0755 debian/local/btrfs.local-premount debian/btrfs-progs/usr/share/initramfs-tools/scripts/local-premount/btrfs
make[1]: Leaving directory '/home/btrfs-progs-4.4.1'
   dh_install -O--parallel
/home/btrfs-progs-4.4.1/debian/btrfs-progs.install: 1: /home/btrfs-progs-4.4.1/debian/btrfs-progs.install: btrfs-calc-size: not found
/home/btrfs-progs-4.4.1/debian/btrfs-progs.install: 2: /home/btrfs-progs-4.4.1/debian/btrfs-progs.install: btrfs-select-super: not found
/home/btrfs-progs-4.4.1/debian/btrfs-progs.install: 3: /home/btrfs-progs-4.4.1/debian/btrfs-progs.install: ioctl.h: not found
dh_install: problem reading debian/btrfs-progs.install: 
debian/rules:16: recipe for target 'binary' failed
make: *** [binary] Error 127
dpkg-buildpackage: error: fakeroot debian/rules binary gave error exit status 2
debuild: fatal error at line 1376:
dpkg-buildpackage -rfakeroot -D -us -uc failed

-- 
With best regards,
Dmitry

[-- Attachment #2: 00_infinite_loops_on_ignore_errors.patch --]
[-- Type: text/plain, Size: 441 bytes --]

Index: btrfs-progs-4.4.1/cmds-restore.c
===================================================================
--- btrfs-progs-4.4.1.orig/cmds-restore.c
+++ btrfs-progs-4.4.1/cmds-restore.c
@@ -438,6 +438,9 @@ static enum loop_response ask_to_continu
 	char buf[2];
 	char *ret;

+	if (ignore_errors)
+		return LOOP_CONTINUE;
+
 	printf("We seem to be looping a lot on %s, do you want to keep going "
 	       "on ? (y/N/a): ", file);
 again:

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash if both devices in raid1 are failing
  2016-04-27  2:44       ` Dmitry Katsubo
@ 2016-05-02 20:51         ` Dmitry Katsubo
  0 siblings, 0 replies; 9+ messages in thread
From: Dmitry Katsubo @ 2016-05-02 20:51 UTC (permalink / raw)
  To: linux-btrfs

Hello,

If somebody is interested in digging into the problem, I would be happy to provide
more information and/or do the testing.

On 2016-04-27 04:44, Dmitry Katsubo wrote:
> # cat /mnt/tmp/file > /dev/null
> [   11.432059] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [   11.436665] ata3.00: BMDMA stat 0x25
> [   11.441301] ata3.00: failed command: READ DMA
> [   11.479570] ata3.00: cmd c8/00:20:40:ec:f3/00:00:00:00:00/e3 tag 0 dma 16384 in
> [   11.479664]          res 51/40:1e:42:ec:f3/00:00:00:00:00/e3 Emask 0x9 (media error)
> [   11.619086] ata3.00: status: { DRDY ERR }
> [   11.619126] ata3.00: error: { UNC }
> [   11.625750] blk_update_request: I/O error, dev sda, sector 66317378
> [   11.625779] NOHZ: local_softirq_pending 40
> [   70.969876] ------------[ cut here ]------------
> [   70.969879] kernel BUG at /build/linux-SBJFwR/linux-4.4.6/debian/build/source_rt/fs/btrfs/volumes.c:5509!
> [   70.969885] invalid opcode: 0000 [#1] PREEMPT SMP 
> [   70.969954] Modules linked in: netconsole configfs bridge stp llc arc4 iTCO_wdt iTCO_vendor_support ppdev coretemp pcspkr serio_raw i2c_i801 ath5k ath mac80211 cfg80211 sr9700 evdev rfkill dm9601 usbnet lpc_ich mfd_core mii option usb_wwan usbserial rng_core sg snd_hda_codec_realtek snd_hda_codec_generic i915 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm acpi_cpufreq snd_timer video 8250_fintek snd drm_kms_helper soundcore tpm_tis drm tpm parport_pc i2c_algo_bit parport shpchp button processor binfmt_misc w83627hf hwmon_vid autofs4 xfs libcrc32c hid_generic usbhid hid crc32c_generic btrfs xor raid6_pq uas usb_storage sd_mod sr_mod cdrom ata_generic firewire_ohci ata_piix libata scsi_mod firewire_core crc_itu_t ehci_pci uhci_hcd ehci_hcd usbcore usb_common e1000e ptp pps_core
> [   70.969965] CPU: 0 PID: 114 Comm: kworker/u4:3 Tainted: G        W       4.4.0-1-rt-686-pae #1 Debian 4.4.6-1
> [   70.969968] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007
> [   70.970029] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
> [   70.970032] task: f3eec0c0 ti: f6a76000 task.ti: f6a76000
> [   70.970036] EIP: 0060:[<f87506be>] EFLAGS: 00010217 CPU: 0
> [   70.970076] EIP is at __btrfs_map_block+0x11be/0x15a0 [btrfs]


-- 
With best regards,
Dmitry

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-05-02 22:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-14 20:30 Kernel crash if both devices in raid1 are failing Dmitry Katsubo
2016-04-17 23:18 ` Dmitry Katsubo
2016-04-21  3:45   ` Liu Bo
     [not found]     ` <571DC34A.50509@gmail.com>
2016-04-27  2:44       ` Dmitry Katsubo
2016-05-02 20:51         ` Dmitry Katsubo
2016-04-18  0:19 ` Chris Murphy
2016-04-19  5:45   ` Dmitry Katsubo
2016-04-19  7:58     ` Duncan
2016-04-20 22:02       ` Dmitry Katsubo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).