All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmitry Katsubo <dmitry.katsubo@gmail.com>
To: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Kernel crash if both devices in raid1 are failing
Date: Wed, 27 Apr 2016 04:44:07 +0200	[thread overview]
Message-ID: <57202777.3010402@gmail.com> (raw)
In-Reply-To: <571DC34A.50509@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5331 bytes --]

On 2016-04-25 09:12, Dmitry Katsubo wrote:
> I have run "btrfs check /dev/sda" two times. One time it has completed
> OK, actually showing only one error. The 2nd time it has shown many messages
> 
> "parent transid verify failed on NNN wanted AAA found BBB"
> 
> and then asserted :) But I think the 2nd run is not representative as I have
> gracefully removed one drive from btrfs array to build a new array. The
> "btrfs device remove" completed successfully, but it might have written some
> metadata to the remaining drives, which perhaps was not synchronized
> correctly.
> 
> What I am going to do next is to recompile btrfs-tools so that "-i" CLI option
> applies "(y)" to all questions and run "btrfs restore" again. Hopefully it can
> handle transid mismatch correctly...

OK, I have recompiled btrfs with necessary fix (attached). It allowed me to capture
"btrfs restore" output because due to reads from console it was not possible, even
with attempts like this:

while true; do echo y; done | btrfs restore -voxmSi /dev/sda /mnt/backup 2>&1 | tee btrfs_restore

For the matter of experiment I have upgraded kernel to 4.4.6 and it still crashes
on problematic file:

# cat /mnt/tmp/file > /dev/null
[   11.432059] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[   11.436665] ata3.00: BMDMA stat 0x25
[   11.441301] ata3.00: failed command: READ DMA
[   11.479570] ata3.00: cmd c8/00:20:40:ec:f3/00:00:00:00:00/e3 tag 0 dma 16384 in
[   11.479664]          res 51/40:1e:42:ec:f3/00:00:00:00:00/e3 Emask 0x9 (media error)
[   11.619086] ata3.00: status: { DRDY ERR }
[   11.619126] ata3.00: error: { UNC }
[   11.625750] blk_update_request: I/O error, dev sda, sector 66317378
[   11.625779] NOHZ: local_softirq_pending 40
[   70.969876] ------------[ cut here ]------------
[   70.969879] kernel BUG at /build/linux-SBJFwR/linux-4.4.6/debian/build/source_rt/fs/btrfs/volumes.c:5509!
[   70.969885] invalid opcode: 0000 [#1] PREEMPT SMP 
[   70.969954] Modules linked in: netconsole configfs bridge stp llc arc4 iTCO_wdt iTCO_vendor_support ppdev coretemp pcspkr serio_raw i2c_i801 ath5k ath mac80211 cfg80211 sr9700 evdev rfkill dm9601 usbnet lpc_ich mfd_core mii option usb_wwan usbserial rng_core sg snd_hda_codec_realtek snd_hda_codec_generic i915 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm acpi_cpufreq snd_timer video 8250_fintek snd drm_kms_helper soundcore tpm_tis drm tpm parport_pc i2c_algo_bit parport shpchp button processor binfmt_misc w83627hf hwmon_vid autofs4 xfs libcrc32c hid_generic usbhid hid crc32c_generic btrfs xor raid6_pq uas usb_storage sd_mod sr_mod cdrom ata_generic firewire_ohci ata_piix libata scsi_mod firewire_core crc_itu_t ehci_pci uhci_hcd ehci_hcd usbcore usb_common e1000e ptp pps_core
[   70.969965] CPU: 0 PID: 114 Comm: kworker/u4:3 Tainted: G        W       4.4.0-1-rt-686-pae #1 Debian 4.4.6-1
[   70.969968] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007
[   70.970029] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
[   70.970032] task: f3eec0c0 ti: f6a76000 task.ti: f6a76000
[   70.970036] EIP: 0060:[<f87506be>] EFLAGS: 00010217 CPU: 0
[   70.970076] EIP is at __btrfs_map_block+0x11be/0x15a0 [btrfs]

Unfortunately I was not able to capture the whole trace, as there seem to be
concurrent problem with netconsole: the whole system hangs at the point above.

P.S. If debian maintainer of btrfs-progs is on the list: Project packaging fails
for me (happens at the very end during binaries installation):

# debuild
...
dpkg-source: info: building btrfs-progs in btrfs-progs_4.4.1-1.1.debian.tar.xz
dpkg-source: info: building btrfs-progs in btrfs-progs_4.4.1-1.1.dsc
 debian/rules build
dh build --parallel
   dh_testdir -O--parallel
   debian/rules override_dh_auto_configure
make[1]: Entering directory '/home/btrfs-progs-4.4.1'
dh_auto_configure -- --bindir=/bin
make[1]: Leaving directory '/home/btrfs-progs-4.4.1'
   dh_auto_build -O--parallel
 fakeroot debian/rules binary
dh binary --parallel
   dh_testroot -O--parallel
   dh_prep -O--parallel
   debian/rules override_dh_auto_install
make[1]: Entering directory '/home/btrfs-progs-4.4.1'
dh_auto_install --destdir=debian/btrfs-progs
# Adding initramfs-tools integration
install -D -m 0755 debian/local/btrfs.hook debian/btrfs-progs/usr/share/initramfs-tools/hooks/btrfs
install -D -m 0755 debian/local/btrfs.local-premount debian/btrfs-progs/usr/share/initramfs-tools/scripts/local-premount/btrfs
make[1]: Leaving directory '/home/btrfs-progs-4.4.1'
   dh_install -O--parallel
/home/btrfs-progs-4.4.1/debian/btrfs-progs.install: 1: /home/btrfs-progs-4.4.1/debian/btrfs-progs.install: btrfs-calc-size: not found
/home/btrfs-progs-4.4.1/debian/btrfs-progs.install: 2: /home/btrfs-progs-4.4.1/debian/btrfs-progs.install: btrfs-select-super: not found
/home/btrfs-progs-4.4.1/debian/btrfs-progs.install: 3: /home/btrfs-progs-4.4.1/debian/btrfs-progs.install: ioctl.h: not found
dh_install: problem reading debian/btrfs-progs.install: 
debian/rules:16: recipe for target 'binary' failed
make: *** [binary] Error 127
dpkg-buildpackage: error: fakeroot debian/rules binary gave error exit status 2
debuild: fatal error at line 1376:
dpkg-buildpackage -rfakeroot -D -us -uc failed


-- 
With best regards,
Dmitry

[-- Attachment #2: 00_infinite_loops_on_ignore_errors.patch --]
[-- Type: text/plain, Size: 441 bytes --]

Index: btrfs-progs-4.4.1/cmds-restore.c
===================================================================
--- btrfs-progs-4.4.1.orig/cmds-restore.c
+++ btrfs-progs-4.4.1/cmds-restore.c
@@ -438,6 +438,9 @@ static enum loop_response ask_to_continu
 	char buf[2];
 	char *ret;
 
+	if (ignore_errors)
+		return LOOP_CONTINUE;
+
 	printf("We seem to be looping a lot on %s, do you want to keep going "
 	       "on ? (y/N/a): ", file);
 again:

  parent reply	other threads:[~2016-04-27  2:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-14 20:30 Kernel crash if both devices in raid1 are failing Dmitry Katsubo
2016-04-17 23:18 ` Dmitry Katsubo
2016-04-21  3:45   ` Liu Bo
     [not found]     ` <571DC34A.50509@gmail.com>
2016-04-27  2:44       ` Dmitry Katsubo [this message]
2016-05-02 20:51         ` Dmitry Katsubo
2016-04-18  0:19 ` Chris Murphy
2016-04-19  5:45   ` Dmitry Katsubo
2016-04-19  7:58     ` Duncan
2016-04-20 22:02       ` Dmitry Katsubo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57202777.3010402@gmail.com \
    --to=dmitry.katsubo@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.