* Post ext3 conversion problems @ 2016-09-16 19:25 Sean Greenslade 2016-09-16 20:23 ` Chris Murphy 2016-09-17 2:27 ` Liu Bo 0 siblings, 2 replies; 17+ messages in thread From: Sean Greenslade @ 2016-09-16 19:25 UTC (permalink / raw) To: linux-btrfs Hi, all. I've been playing around with an old laptop of mine, and I figured I'd use it as a learning / bugfinding opportunity. Its /home partition was originally ext3. I have a full partition image of this drive as a backup, so I can do (and have done) potentially destructive things. The system disk is a ~6 year old SSD. To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1) and ran a simple btrfs-convert on it. After patching up the fstab and rebooting, everything seemed fine. I deleted the recovery subvol, ran a full balance, ran a full defrag, and rebooted again. I then decided to try (as an experiment) using DUP mode for data and metadata. I ran that balance without issue, then started using the machine. Sometime later, I got the following remount ro: [ 7316.764235] ------------[ cut here ]------------ [ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs] [ 7316.764297] BTRFS: Transaction aborted (error -95) [ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media crc32c_generic iTCO_wdt btrfs iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common ath9k_hw ath mac80211 snd_hda_codec_realtek snd_hda_codec_generic psmouse input_leds coretemp snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 snd_hwdep snd_hda_core snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore shpchp intel_agp wmi thermal fjes battery evdev ac tpm_tis mac_hid tpm sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop sg acpi_cpufreq ip_tables x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw atkbd libps2 ahci libahci uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore [ 7316.764434] usb_common i8042 serio i915 video button intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm [ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G O 4.7.3-5-ck #1 [ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903 11/08/2010 [ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] [ 7316.764513] 0000000000000286 000000006101f47d ffff8800230dbc78 ffffffff812f0215 [ 7316.764522] ffff8800230dbcc8 0000000000000000 ffff8800230dbcb8 ffffffff8107ae6f [ 7316.764530] 00000b8a00000035 ffff88007791afa8 ffff8800751d9000 ffff880014101d40 [ 7316.764538] Call Trace: [ 7316.764551] [<ffffffff812f0215>] dump_stack+0x63/0x8e [ 7316.764560] [<ffffffff8107ae6f>] __warn+0xcf/0xf0 [ 7316.764567] [<ffffffff8107aef1>] warn_slowpath_fmt+0x61/0x80 [ 7316.764605] [<ffffffffa07aa362>] ? unpin_extent_cache+0xa2/0xf0 [btrfs] [ 7316.764640] [<ffffffffa07628e6>] ? btrfs_free_path+0x26/0x30 [btrfs] [ 7316.764677] [<ffffffffa079aaac>] btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs] [ 7316.764715] [<ffffffffa079adc5>] finish_ordered_fn+0x15/0x20 [btrfs] [ 7316.764753] [<ffffffffa07c5f8e>] btrfs_scrubparity_helper+0x7e/0x360 [btrfs] [ 7316.764791] [<ffffffffa07c62fe>] btrfs_endio_write_helper+0xe/0x10 [btrfs] [ 7316.764799] [<ffffffff810949bd>] process_one_work+0x1ed/0x490 [ 7316.764806] [<ffffffff81094ca9>] worker_thread+0x49/0x500 [ 7316.764813] [<ffffffff81094c60>] ? process_one_work+0x490/0x490 [ 7316.764820] [<ffffffff8109ac3a>] kthread+0xda/0xf0 [ 7316.764830] [<ffffffff815c553f>] ret_from_fork+0x1f/0x40 [ 7316.764838] [<ffffffff8109ab60>] ? kthread_worker_fn+0x170/0x170 [ 7316.764843] ---[ end trace 90f54effc5e294b0 ]--- [ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: errno=-95 unknown [ 7316.764859] BTRFS info (device sda2): forced readonly [ 7316.765396] pending csums is 9437184 After seeing this, I decided to attempt a repair (confident that I could restore from backup if it failed). At the time, I was unaware of the issues with progs 4.7.1, so when I ran the check and saw all the incorrect backrefs messages, I figured that was my problem and ran the --repair. Of course, this didn't make the messages go away on subsequent checks, so I looked further and found this bug: https://bugzilla.kernel.org/show_bug.cgi?id=155791 I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of the logs from these, unfortunately). The repair seemed to work (I also used --init-extent-tree), as current checks don't report any errors. The system boots and mounts the FS just fine. I can read from it all day, scrubs complete without failure, but just using the system for a while will eventually trigger the same "Transaction aborted (error -95)" error. I realize this is something of a mess, and that I was less than methodical with my actions so far. Given that I have a full backup that can be restored if need be (and I certainly could try running the convert again), what is my best course of action? Thanks, --Sean ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-16 19:25 Post ext3 conversion problems Sean Greenslade @ 2016-09-16 20:23 ` Chris Murphy 2016-09-16 23:25 ` Sean Greenslade 2016-09-19 2:20 ` Qu Wenruo 2016-09-17 2:27 ` Liu Bo 1 sibling, 2 replies; 17+ messages in thread From: Chris Murphy @ 2016-09-16 20:23 UTC (permalink / raw) To: Sean Greenslade, Qu Wenruo, David Sterba; +Cc: Btrfs BTRFS On Fri, Sep 16, 2016 at 1:25 PM, Sean Greenslade <sean@seangreenslade.com> wrote: > Hi, all. I've been playing around with an old laptop of mine, and I > figured I'd use it as a learning / bugfinding opportunity. Its /home > partition was originally ext3. I have a full partition image of this > drive as a backup, so I can do (and have done) potentially destructive > things. The system disk is a ~6 year old SSD. > > To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1) > and ran a simple btrfs-convert on it. After patching up the fstab and > rebooting, everything seemed fine. I deleted the recovery subvol, ran a > full balance, ran a full defrag, and rebooted again. I then decided to > try (as an experiment) using DUP mode for data and metadata. I ran that > balance without issue, then started using the machine. Sometime later, I > got the following remount ro: > > [ 7316.764235] ------------[ cut here ]------------ > [ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs] > [ 7316.764297] BTRFS: Transaction aborted (error -95) > [ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media crc32c_generic iTCO_wdt btrfs iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common ath9k_hw ath mac80211 snd_hda_codec_realtek snd_hda_codec_generic psmouse input_leds coretemp snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 snd_hwdep snd_hda_core snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore shpchp intel_agp wmi thermal fjes battery evdev ac tpm_tis mac_hid tpm sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop sg acpi_cpufreq ip_tables x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw atkbd libps2 ahci libahci uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore > [ 7316.764434] usb_common i8042 serio i915 video button intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm > [ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G O 4.7.3-5-ck #1 > [ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903 11/08/2010 > [ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] > [ 7316.764513] 0000000000000286 000000006101f47d ffff8800230dbc78 ffffffff812f0215 > [ 7316.764522] ffff8800230dbcc8 0000000000000000 ffff8800230dbcb8 ffffffff8107ae6f > [ 7316.764530] 00000b8a00000035 ffff88007791afa8 ffff8800751d9000 ffff880014101d40 > [ 7316.764538] Call Trace: > [ 7316.764551] [<ffffffff812f0215>] dump_stack+0x63/0x8e > [ 7316.764560] [<ffffffff8107ae6f>] __warn+0xcf/0xf0 > [ 7316.764567] [<ffffffff8107aef1>] warn_slowpath_fmt+0x61/0x80 > [ 7316.764605] [<ffffffffa07aa362>] ? unpin_extent_cache+0xa2/0xf0 [btrfs] > [ 7316.764640] [<ffffffffa07628e6>] ? btrfs_free_path+0x26/0x30 [btrfs] > [ 7316.764677] [<ffffffffa079aaac>] btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs] > [ 7316.764715] [<ffffffffa079adc5>] finish_ordered_fn+0x15/0x20 [btrfs] > [ 7316.764753] [<ffffffffa07c5f8e>] btrfs_scrubparity_helper+0x7e/0x360 [btrfs] > [ 7316.764791] [<ffffffffa07c62fe>] btrfs_endio_write_helper+0xe/0x10 [btrfs] > [ 7316.764799] [<ffffffff810949bd>] process_one_work+0x1ed/0x490 > [ 7316.764806] [<ffffffff81094ca9>] worker_thread+0x49/0x500 > [ 7316.764813] [<ffffffff81094c60>] ? process_one_work+0x490/0x490 > [ 7316.764820] [<ffffffff8109ac3a>] kthread+0xda/0xf0 > [ 7316.764830] [<ffffffff815c553f>] ret_from_fork+0x1f/0x40 > [ 7316.764838] [<ffffffff8109ab60>] ? kthread_worker_fn+0x170/0x170 > [ 7316.764843] ---[ end trace 90f54effc5e294b0 ]--- > [ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: errno=-95 unknown > [ 7316.764859] BTRFS info (device sda2): forced readonly > [ 7316.765396] pending csums is 9437184 > > After seeing this, I decided to attempt a repair (confident that I could > restore from backup if it failed). At the time, I was unaware of the > issues with progs 4.7.1, so when I ran the check and saw all the > incorrect backrefs messages, I figured that was my problem and ran the > --repair. Of course, this didn't make the messages go away on subsequent > checks, so I looked further and found this bug: > > https://bugzilla.kernel.org/show_bug.cgi?id=155791 > > I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of > the logs from these, unfortunately). The repair seemed to work (I also > used --init-extent-tree), as current checks don't report any errors. > > The system boots and mounts the FS just fine. I can read from it all > day, scrubs complete without failure, but just using the system for a > while will eventually trigger the same "Transaction aborted (error -95)" > error. > > I realize this is something of a mess, and that I was less than > methodical with my actions so far. Given that I have a full backup that > can be restored if need be (and I certainly could try running the > convert again), what is my best course of action? Not a mess, I think it's a good bug report. I think Qu and David know more about the latest iteration of the convert code. If you can wait until next week at least to see if they have questions that'd be best. If you need to get access to the computer sooner than later I suggest btrfs-image -c9 -t4 -s to make a filename sanitized copy of the filesystem metadata for them to look at, just in case. They might be able to figure out the problem just from the stack trace, but better to have the image before blowing away the file system, just in case they want it. -- Chris Murphy ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-16 20:23 ` Chris Murphy @ 2016-09-16 23:25 ` Sean Greenslade 2016-09-16 23:45 ` Chris Murphy 2016-09-19 2:20 ` Qu Wenruo 1 sibling, 1 reply; 17+ messages in thread From: Sean Greenslade @ 2016-09-16 23:25 UTC (permalink / raw) To: Chris Murphy; +Cc: Qu Wenruo, David Sterba, Btrfs BTRFS On Fri, Sep 16, 2016 at 02:23:44PM -0600, Chris Murphy wrote: > Not a mess, I think it's a good bug report. I think Qu and David know > more about the latest iteration of the convert code. If you can wait > until next week at least to see if they have questions that'd be best. > If you need to get access to the computer sooner than later I suggest > btrfs-image -c9 -t4 -s to make a filename sanitized copy of the > filesystem metadata for them to look at, just in case. They might be > able to figure out the problem just from the stack trace, but better > to have the image before blowing away the file system, just in case > they want it. I can hang on to the system in its current state, I don't particularly need this machine fully operational. Just to be proactive, I ran the btrfs-image as follows: btrfs-image -c9 -t4 -s -w /dev/sda2 dumpfile http://phead.us/tmp/sgreenslade_home_sanitized_2016-09-16.btrfs In the mean time, is there any way to make the kernel more verbose about btrfs errors? It would be nice to see, for example, what was in the transaction that failed, or at least what files / metadata it was touching. --Sean ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-16 23:25 ` Sean Greenslade @ 2016-09-16 23:45 ` Chris Murphy 2016-09-17 0:03 ` Sean Greenslade 0 siblings, 1 reply; 17+ messages in thread From: Chris Murphy @ 2016-09-16 23:45 UTC (permalink / raw) To: Sean Greenslade; +Cc: Chris Murphy, Qu Wenruo, David Sterba, Btrfs BTRFS On Fri, Sep 16, 2016 at 5:25 PM, Sean Greenslade <sean@seangreenslade.com> wrote: > In the mean time, is there any way to make the kernel more verbose about > btrfs errors? It would be nice to see, for example, what was in the > transaction that failed, or at least what files / metadata it was > touching. No idea. Maybe one of the compile time options: CONFIG_BTRFS_FS_CHECK_INTEGRITY=y This also requires mount options, either check_int or check_int_data CONFIG_BTRFS_FS_RUN_SANITY_TESTS CONFIG_BTRFS_DEBUG=y https://patchwork.kernel.org/patch/846462/ CONFIG_BTRFS_ASSERT=y Actually, even before that maybe if you did a 'btrfs-debug-tree /dev/sdX' That might explode in the vicinity of the problem. Thing is, btrfs check doesn't see anything wrong with the metadata, so chances are debug-tree won't either. -- Chris Murphy ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-16 23:45 ` Chris Murphy @ 2016-09-17 0:03 ` Sean Greenslade 0 siblings, 0 replies; 17+ messages in thread From: Sean Greenslade @ 2016-09-17 0:03 UTC (permalink / raw) To: Chris Murphy; +Cc: Qu Wenruo, David Sterba, Btrfs BTRFS On Fri, Sep 16, 2016 at 05:45:59PM -0600, Chris Murphy wrote: > On Fri, Sep 16, 2016 at 5:25 PM, Sean Greenslade > <sean@seangreenslade.com> wrote: > > > In the mean time, is there any way to make the kernel more verbose about > > btrfs errors? It would be nice to see, for example, what was in the > > transaction that failed, or at least what files / metadata it was > > touching. > > No idea. Maybe one of the compile time options: > > > CONFIG_BTRFS_FS_CHECK_INTEGRITY=y > This also requires mount options, either check_int or check_int_data > CONFIG_BTRFS_FS_RUN_SANITY_TESTS > CONFIG_BTRFS_DEBUG=y > https://patchwork.kernel.org/patch/846462/ > CONFIG_BTRFS_ASSERT=y > > Actually, even before that maybe if you did a 'btrfs-debug-tree /dev/sdX' > > That might explode in the vicinity of the problem. Thing is, btrfs > check doesn't see anything wrong with the metadata, so chances are > debug-tree won't either. Hmm, I'll probably have a go at compiling the latest mainline kernel with CONFIG_BTRFS_DEBUG enabled. It certainly can't hurt to try. And as you suspected, btrfs-debug-tree didn't explode / error out on me. I didn't thoroughly inspect the output (as I have very little understanding of the btrfs internals), but it all seemed OK. --Sean ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-16 20:23 ` Chris Murphy 2016-09-16 23:25 ` Sean Greenslade @ 2016-09-19 2:20 ` Qu Wenruo 2016-09-19 4:12 ` Sean Greenslade 1 sibling, 1 reply; 17+ messages in thread From: Qu Wenruo @ 2016-09-19 2:20 UTC (permalink / raw) To: Chris Murphy, Sean Greenslade, David Sterba; +Cc: Btrfs BTRFS At 09/17/2016 04:23 AM, Chris Murphy wrote: > On Fri, Sep 16, 2016 at 1:25 PM, Sean Greenslade > <sean@seangreenslade.com> wrote: >> Hi, all. I've been playing around with an old laptop of mine, and I >> figured I'd use it as a learning / bugfinding opportunity. Its /home >> partition was originally ext3. I have a full partition image of this >> drive as a backup, so I can do (and have done) potentially destructive >> things. The system disk is a ~6 year old SSD. >> >> To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1) Although there are reports of false btrfsck alerts of 4.7.1, btrfs-convert is not related to that false alert, and I assume it's OK. >> and ran a simple btrfs-convert on it. After patching up the fstab and >> rebooting, everything seemed fine. I deleted the recovery subvol, ran a >> full balance, ran a full defrag, and rebooted again. I then decided to >> try (as an experiment) using DUP mode for data and metadata. I ran that >> balance without issue, then started using the machine. Sometime later, I >> got the following remount ro: >> >> [ 7316.764235] ------------[ cut here ]------------ >> [ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs] >> [ 7316.764297] BTRFS: Transaction aborted (error -95) >> [ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media crc32c_generic iTCO_wdt btrfs iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common ath9k_hw ath mac80211 snd_hda_codec_realtek snd_hda_codec_generic psmouse input_leds coretemp snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 snd_hwdep snd_hda_core snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore shpchp intel_agp wmi thermal fjes battery evdev ac tpm_tis mac_hid tpm sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop sg acpi_cpufreq ip_tables x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw atkbd libps2 ahci libahci uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore >> [ 7316.764434] usb_common i8042 serio i915 video button intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm >> [ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G O 4.7.3-5-ck #1 >> [ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903 11/08/2010 >> [ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] >> [ 7316.764513] 0000000000000286 000000006101f47d ffff8800230dbc78 ffffffff812f0215 >> [ 7316.764522] ffff8800230dbcc8 0000000000000000 ffff8800230dbcb8 ffffffff8107ae6f >> [ 7316.764530] 00000b8a00000035 ffff88007791afa8 ffff8800751d9000 ffff880014101d40 >> [ 7316.764538] Call Trace: >> [ 7316.764551] [<ffffffff812f0215>] dump_stack+0x63/0x8e >> [ 7316.764560] [<ffffffff8107ae6f>] __warn+0xcf/0xf0 >> [ 7316.764567] [<ffffffff8107aef1>] warn_slowpath_fmt+0x61/0x80 >> [ 7316.764605] [<ffffffffa07aa362>] ? unpin_extent_cache+0xa2/0xf0 [btrfs] >> [ 7316.764640] [<ffffffffa07628e6>] ? btrfs_free_path+0x26/0x30 [btrfs] >> [ 7316.764677] [<ffffffffa079aaac>] btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs] This means btrfs_update_inode_fallback() fails. >> [ 7316.764715] [<ffffffffa079adc5>] finish_ordered_fn+0x15/0x20 [btrfs] >> [ 7316.764753] [<ffffffffa07c5f8e>] btrfs_scrubparity_helper+0x7e/0x360 [btrfs] Scrub code then. Not that familiar though. >> [ 7316.764791] [<ffffffffa07c62fe>] btrfs_endio_write_helper+0xe/0x10 [btrfs] >> [ 7316.764799] [<ffffffff810949bd>] process_one_work+0x1ed/0x490 >> [ 7316.764806] [<ffffffff81094ca9>] worker_thread+0x49/0x500 >> [ 7316.764813] [<ffffffff81094c60>] ? process_one_work+0x490/0x490 >> [ 7316.764820] [<ffffffff8109ac3a>] kthread+0xda/0xf0 >> [ 7316.764830] [<ffffffff815c553f>] ret_from_fork+0x1f/0x40 >> [ 7316.764838] [<ffffffff8109ab60>] ? kthread_worker_fn+0x170/0x170 >> [ 7316.764843] ---[ end trace 90f54effc5e294b0 ]--- >> [ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: errno=-95 unknown -95 is -EOPNOTSUPP. Not a common errno in btrfs. Most EOPNOTSUPP are related to discard and crapped fallcate/drop extents. Then are you using discard mount option? >> [ 7316.764859] BTRFS info (device sda2): forced readonly >> [ 7316.765396] pending csums is 9437184 >> >> After seeing this, I decided to attempt a repair (confident that I could >> restore from backup if it failed). At the time, I was unaware of the >> issues with progs 4.7.1, so when I ran the check and saw all the >> incorrect backrefs messages, I figured that was my problem and ran the >> --repair. Of course, this didn't make the messages go away on subsequent >> checks, so I looked further and found this bug: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=155791 >> >> I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of >> the logs from these, unfortunately). The repair seemed to work (I also >> used --init-extent-tree), as current checks don't report any errors. Personally I pretty trust btrfsck, as it's based on tons of error we have exposed, and it's much easier to code to expose problems. Unless there are something wrong we never met before, at least your on-disk metadata should be OK. >> >> The system boots and mounts the FS just fine. I can read from it all >> day, scrubs complete without failure. Then at least your data matches with its checksum. And consider you have done a full balance, it mostly ruled out the possibility of the special chunk layout introduced by convert. >> but just using the system for a >> while will eventually trigger the same "Transaction aborted (error -95)" >> error. >> >> I realize this is something of a mess, and that I was less than >> methodical with my actions so far. Given that I have a full backup that >> can be restored if need be (and I certainly could try running the >> convert again), what is my best course of action? Normally a btrfs-debug-tree would help in most case, but this time it seems to be a runtime scrub bug other than on-disk metadata corruption. What I can see here is, with all your operation, your fs should be a normal btrfs, other than converted one. To confirm my idea, would you please upload the following things if your filesystem is not too large? # btrfs-debug-tree -t extent <your device> # btrfs-debug-tree -t chunk <your device> # btrfs-debug-tree -t dev <your device> There is no file/dir name/data contained in the dump. So it's just chunk/extent allocation info. You could upload them at ease. > > > Not a mess, I think it's a good bug report. I think Qu and David know > more about the latest iteration of the convert code. If you can wait > until next week at least to see if they have questions that'd be best. > If you need to get access to the computer sooner than later I suggest > btrfs-image -c9 -t4 -s to make a filename sanitized copy of the > filesystem metadata for them to look at, just in case. They might be > able to figure out the problem just from the stack trace, but better > to have the image before blowing away the file system, just in case > they want it. > Yes, btrfs-image dump would be the best. Although sanitizing may takes a long time and the output may be too large. Thanks, Qu ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-19 2:20 ` Qu Wenruo @ 2016-09-19 4:12 ` Sean Greenslade 2016-09-19 6:30 ` Qu Wenruo 0 siblings, 1 reply; 17+ messages in thread From: Sean Greenslade @ 2016-09-19 4:12 UTC (permalink / raw) To: Qu Wenruo; +Cc: Chris Murphy, David Sterba, Btrfs BTRFS On Mon, Sep 19, 2016 at 10:20:37AM +0800, Qu Wenruo wrote: > <snip> > -95 is -EOPNOTSUPP. > > Not a common errno in btrfs. > > Most EOPNOTSUPP are related to discard and crapped fallcate/drop extents. > > Then are you using discard mount option? I did indeed have the discard mount option enabled. I tried booting with discard disabled, but the same problem appeared. > <snip> > Normally a btrfs-debug-tree would help in most case, but this time it seems > to be a runtime scrub bug other than on-disk metadata corruption. > > What I can see here is, with all your operation, your fs should be a normal > btrfs, other than converted one. > > To confirm my idea, would you please upload the following things if your > filesystem is not too large? > > # btrfs-debug-tree -t extent <your device> > # btrfs-debug-tree -t chunk <your device> > # btrfs-debug-tree -t dev <your device> > > There is no file/dir name/data contained in the dump. So it's just > chunk/extent allocation info. > You could upload them at ease. > > > Not a mess, I think it's a good bug report. I think Qu and David know > > more about the latest iteration of the convert code. If you can wait > > until next week at least to see if they have questions that'd be best. > > If you need to get access to the computer sooner than later I suggest > > btrfs-image -c9 -t4 -s to make a filename sanitized copy of the > > filesystem metadata for them to look at, just in case. They might be > > able to figure out the problem just from the stack trace, but better > > to have the image before blowing away the file system, just in case > > they want it. > > Yes, btrfs-image dump would be the best. > Although sanitizing may takes a long time and the output may be too large. I had posted a btrfs-image before. It was run with a single -s flag: http://phead.us/tmp/sgreenslade_home_sanitized_2016-09-16.btrfs Here's the debug tree data: http://phead.us/tmp/wheatley_chunk_2016-09-18.dump.gz http://phead.us/tmp/wheatley_extent_2016-09-18.dump.gz http://phead.us/tmp/wheatley_dev_2016-09-18.dump.gz Thanks, --Sean ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-19 4:12 ` Sean Greenslade @ 2016-09-19 6:30 ` Qu Wenruo 2016-09-19 15:13 ` Sean Greenslade 0 siblings, 1 reply; 17+ messages in thread From: Qu Wenruo @ 2016-09-19 6:30 UTC (permalink / raw) To: Sean Greenslade; +Cc: Chris Murphy, David Sterba, Btrfs BTRFS At 09/19/2016 12:12 PM, Sean Greenslade wrote: > On Mon, Sep 19, 2016 at 10:20:37AM +0800, Qu Wenruo wrote: >> <snip> >> -95 is -EOPNOTSUPP. >> >> Not a common errno in btrfs. >> >> Most EOPNOTSUPP are related to discard and crapped fallcate/drop extents. >> >> Then are you using discard mount option? > > I did indeed have the discard mount option enabled. I tried booting with > discard disabled, but the same problem appeared. > >> <snip> >> Normally a btrfs-debug-tree would help in most case, but this time it seems >> to be a runtime scrub bug other than on-disk metadata corruption. >> >> What I can see here is, with all your operation, your fs should be a normal >> btrfs, other than converted one. >> >> To confirm my idea, would you please upload the following things if your >> filesystem is not too large? >> >> # btrfs-debug-tree -t extent <your device> >> # btrfs-debug-tree -t chunk <your device> >> # btrfs-debug-tree -t dev <your device> >> >> There is no file/dir name/data contained in the dump. So it's just >> chunk/extent allocation info. >> You could upload them at ease. >> >>> Not a mess, I think it's a good bug report. I think Qu and David know >>> more about the latest iteration of the convert code. If you can wait >>> until next week at least to see if they have questions that'd be best. >>> If you need to get access to the computer sooner than later I suggest >>> btrfs-image -c9 -t4 -s to make a filename sanitized copy of the >>> filesystem metadata for them to look at, just in case. They might be >>> able to figure out the problem just from the stack trace, but better >>> to have the image before blowing away the file system, just in case >>> they want it. >> >> Yes, btrfs-image dump would be the best. >> Although sanitizing may takes a long time and the output may be too large. > > I had posted a btrfs-image before. It was run with a single -s flag: > > http://phead.us/tmp/sgreenslade_home_sanitized_2016-09-16.btrfs > > Here's the debug tree data: > > http://phead.us/tmp/wheatley_chunk_2016-09-18.dump.gz > http://phead.us/tmp/wheatley_extent_2016-09-18.dump.gz > http://phead.us/tmp/wheatley_dev_2016-09-18.dump.gz > > Thanks, > > --Sean > All chunks are completed convert to DUP, no small chunk, all to its maximum chunk size. So from chunk level, nothing related to convert yet. But for extent tree, I found several extents are heavily referred to. Like extent 158173081600 or 183996522496. If you're not using off-band dedupe, then it's quite possible that's the remaining structure of convert. Not pretty sure if it's related to the bug, but did you do the balance/defrag operation just after removing ext_save subvolume? If so, maybe the subvolume is not fully freed up and later balance/defrag just keeps the old convert extent layout. IIRC to ensure btrfs completely free a subvolume, one needs to call "btrfs filesystem sync <mnt>" to ensure the subvolume is completely deleted. Thanks, Qu > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-19 6:30 ` Qu Wenruo @ 2016-09-19 15:13 ` Sean Greenslade 2016-09-20 2:49 ` Qu Wenruo 0 siblings, 1 reply; 17+ messages in thread From: Sean Greenslade @ 2016-09-19 15:13 UTC (permalink / raw) To: Qu Wenruo; +Cc: Chris Murphy, David Sterba, Btrfs BTRFS On Mon, Sep 19, 2016 at 02:30:28PM +0800, Qu Wenruo wrote: > All chunks are completed convert to DUP, no small chunk, all to its maximum > chunk size. > So from chunk level, nothing related to convert yet. > > But for extent tree, I found several extents are heavily referred to. > Like extent 158173081600 or 183996522496. > > If you're not using off-band dedupe, then it's quite possible that's the > remaining structure of convert. I never ran any sort of dedup on this partition. > Not pretty sure if it's related to the bug, but did you do the > balance/defrag operation just after removing ext_save subvolume? That's quite possible. I did it in a live boot, so I don't have the bash history to check. I checked it just now using "btrfs subvol list -d", and there's nothing listed. I ran a full balance after that, but the problem remains. So whatever the problem is, it can survive a full balance after the ext_save subvol is completely deleted. --Sean ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-19 15:13 ` Sean Greenslade @ 2016-09-20 2:49 ` Qu Wenruo 2016-09-20 3:39 ` Sean Greenslade 0 siblings, 1 reply; 17+ messages in thread From: Qu Wenruo @ 2016-09-20 2:49 UTC (permalink / raw) To: Sean Greenslade; +Cc: Chris Murphy, David Sterba, Btrfs BTRFS At 09/19/2016 11:13 PM, Sean Greenslade wrote: > On Mon, Sep 19, 2016 at 02:30:28PM +0800, Qu Wenruo wrote: >> All chunks are completed convert to DUP, no small chunk, all to its maximum >> chunk size. >> So from chunk level, nothing related to convert yet. >> >> But for extent tree, I found several extents are heavily referred to. >> Like extent 158173081600 or 183996522496. >> >> If you're not using off-band dedupe, then it's quite possible that's the >> remaining structure of convert. > > I never ran any sort of dedup on this partition. > >> Not pretty sure if it's related to the bug, but did you do the >> balance/defrag operation just after removing ext_save subvolume? > > That's quite possible. I did it in a live boot, so I don't have the bash > history to check. I checked it just now using "btrfs subvol list -d", > and there's nothing listed. I ran a full balance after that, but the > problem remains. So whatever the problem is, it can survive a full > balance after the ext_save subvol is completely deleted. > > --Sean > > OK, I see the problem now. The new convert is designed to create minimal number of extents, so it result the following file extents layout: Ext2_save/image | /-------------------------------\ | Extent A | |<---Old Ext3 Used space ------>|<--- Free space--->| \---/\---/\---/ | | | F1 F2 F3 This causes a large extent A, refered by ext2_save/image, and files like F1/2/3 just refer to part of the large extent A. After removing the whole ext2_save subvolume, Extent A is still there, since F1/2/3 are still referring it. |<---Old Ext3 Used space ------>|<--- Free space--->| | Extent A | \---/\---/\---/ | | | F1 F2 F3 In that case, not balance but defrag is responsible to "split" the large extent and free the unused space. However btrfs defrag doesn't work for case like subvolume or reflink for a long time, which leaves the extent layout unchanged. And balance doesn't handle it, as balance just relocated the large extent A and modify all referencers' pointer. Sigh, I just forgot the fact that defrag doesn't work for a long time when designing the new convert. So, we still need a working kernel defrag to make the fs to be a "native" btrfs. However, such layout is completely valid for btrfs, one can generate it quite easily which following script: ---- xfs_io -f -c "pwrite 0 16M" $mnt/orig for i in $(seq 0 15); do xfs_io -f -c "reflink $mnt/orig ${i}M 0 1M" $mnt/file${i} done ---- So there is still something wrong in your backtrace so that we need to dig further. Any idea of your load pattern to trigger the bug? Thanks, Qu ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-20 2:49 ` Qu Wenruo @ 2016-09-20 3:39 ` Sean Greenslade 2016-09-20 5:02 ` Qu Wenruo 0 siblings, 1 reply; 17+ messages in thread From: Sean Greenslade @ 2016-09-20 3:39 UTC (permalink / raw) To: Qu Wenruo; +Cc: Chris Murphy, David Sterba, Btrfs BTRFS On Tue, Sep 20, 2016 at 10:49:42AM +0800, Qu Wenruo wrote: > OK, I see the problem now. > > The new convert is designed to create minimal number of extents, so it > result the following file extents layout: > > Ext2_save/image > | > /-------------------------------\ > | Extent A | > |<---Old Ext3 Used space ------>|<--- Free space--->| > \---/\---/\---/ > | | | > F1 F2 F3 > > This causes a large extent A, refered by ext2_save/image, and files like > F1/2/3 just refer to part of the large extent A. > > After removing the whole ext2_save subvolume, Extent A is still there, since > F1/2/3 are still referring it. > > |<---Old Ext3 Used space ------>|<--- Free space--->| > | Extent A | > \---/\---/\---/ > | | | > F1 F2 F3 > > > > In that case, not balance but defrag is responsible to "split" the large > extent and free the unused space. > > However btrfs defrag doesn't work for case like subvolume or reflink for a > long time, which leaves the extent layout unchanged. > And balance doesn't handle it, as balance just relocated the large extent A > and modify all referencers' pointer. > > Sigh, I just forgot the fact that defrag doesn't work for a long time when > designing the new convert. > > So, we still need a working kernel defrag to make the fs to be a "native" > btrfs. > > > However, such layout is completely valid for btrfs, one can generate it > quite easily which following script: > ---- > xfs_io -f -c "pwrite 0 16M" $mnt/orig > for i in $(seq 0 15); do > xfs_io -f -c "reflink $mnt/orig ${i}M 0 1M" $mnt/file${i} > done > ---- > > So there is still something wrong in your backtrace so that we need to dig > further. > > Any idea of your load pattern to trigger the bug? Glad to hear you've found the core of the issue. At this point, I can trigger it immediately. As soon as I log in and run dmenu, it will attempt to rebuild its cache file (small text file that's just a list of all executables in the PATH). Once that write happens, the bug triggers and the fs goes read only. --Sean ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-20 3:39 ` Sean Greenslade @ 2016-09-20 5:02 ` Qu Wenruo 2016-09-20 20:51 ` Sean Greenslade 0 siblings, 1 reply; 17+ messages in thread From: Qu Wenruo @ 2016-09-20 5:02 UTC (permalink / raw) To: Sean Greenslade; +Cc: Chris Murphy, David Sterba, Btrfs BTRFS At 09/20/2016 11:39 AM, Sean Greenslade wrote: > On Tue, Sep 20, 2016 at 10:49:42AM +0800, Qu Wenruo wrote: >> OK, I see the problem now. >> >> The new convert is designed to create minimal number of extents, so it >> result the following file extents layout: >> >> Ext2_save/image >> | >> /-------------------------------\ >> | Extent A | >> |<---Old Ext3 Used space ------>|<--- Free space--->| >> \---/\---/\---/ >> | | | >> F1 F2 F3 >> >> This causes a large extent A, refered by ext2_save/image, and files like >> F1/2/3 just refer to part of the large extent A. >> >> After removing the whole ext2_save subvolume, Extent A is still there, since >> F1/2/3 are still referring it. >> >> |<---Old Ext3 Used space ------>|<--- Free space--->| >> | Extent A | >> \---/\---/\---/ >> | | | >> F1 F2 F3 >> >> >> >> In that case, not balance but defrag is responsible to "split" the large >> extent and free the unused space. >> >> However btrfs defrag doesn't work for case like subvolume or reflink for a >> long time, which leaves the extent layout unchanged. >> And balance doesn't handle it, as balance just relocated the large extent A >> and modify all referencers' pointer. >> >> Sigh, I just forgot the fact that defrag doesn't work for a long time when >> designing the new convert. >> >> So, we still need a working kernel defrag to make the fs to be a "native" >> btrfs. >> >> >> However, such layout is completely valid for btrfs, one can generate it >> quite easily which following script: >> ---- >> xfs_io -f -c "pwrite 0 16M" $mnt/orig >> for i in $(seq 0 15); do >> xfs_io -f -c "reflink $mnt/orig ${i}M 0 1M" $mnt/file${i} >> done >> ---- >> >> So there is still something wrong in your backtrace so that we need to dig >> further. >> >> Any idea of your load pattern to trigger the bug? > > Glad to hear you've found the core of the issue. > > At this point, I can trigger it immediately. As soon as I log in and run > dmenu, it will attempt to rebuild its cache file (small text file that's > just a list of all executables in the PATH). Once that write happens, > the bug triggers and the fs goes read only. Rewrite? Or write into new inode? And is the same inode always causing the problem? Thanks, Qu > > --Sean > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-20 5:02 ` Qu Wenruo @ 2016-09-20 20:51 ` Sean Greenslade 2016-09-26 2:16 ` Sean Greenslade 0 siblings, 1 reply; 17+ messages in thread From: Sean Greenslade @ 2016-09-20 20:51 UTC (permalink / raw) To: Qu Wenruo; +Cc: Chris Murphy, David Sterba, Btrfs BTRFS On Tue, Sep 20, 2016 at 01:02:38PM +0800, Qu Wenruo wrote: > > Glad to hear you've found the core of the issue. > > > > At this point, I can trigger it immediately. As soon as I log in and run > > dmenu, it will attempt to rebuild its cache file (small text file that's > > just a list of all executables in the PATH). Once that write happens, > > the bug triggers and the fs goes read only. > > Rewrite? Or write into new inode? > > And is the same inode always causing the problem? It's not always the same. It seems like whatever triggers a write first is what kills it. I went to test it, and this time it triggered on my .bash_history file. I have bash set up with "history -a", so presumably that was an append, not an overwrite. To cut down on the number of variables, I booted my system with the "rescue" systemd target, then su'd to my user. Simply running a few commands (with the history -a writes that bash triggered) was enough to trigger the bug. This is on 4.8.0-rc6, with the following compile time options enabled: CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y CONFIG_BTRFS_DEBUG=y CONFIG_BTRFS_ASSERT=y If I run the stock Arch kernel (4.7.2 at the moment), the issue still appears, but it takes longer. My most reliable trigger is Firefox, whose constant DB writes will trigger it within minutes. --Sean ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-20 20:51 ` Sean Greenslade @ 2016-09-26 2:16 ` Sean Greenslade 2016-09-26 2:37 ` Qu Wenruo 0 siblings, 1 reply; 17+ messages in thread From: Sean Greenslade @ 2016-09-26 2:16 UTC (permalink / raw) To: Qu Wenruo; +Cc: Chris Murphy, David Sterba, Btrfs BTRFS On Tue, Sep 20, 2016 at 04:51:21PM -0400, Sean Greenslade wrote: > On Tue, Sep 20, 2016 at 01:02:38PM +0800, Qu Wenruo wrote: > > > Glad to hear you've found the core of the issue. > > > > > > At this point, I can trigger it immediately. As soon as I log in and run > > > dmenu, it will attempt to rebuild its cache file (small text file that's > > > just a list of all executables in the PATH). Once that write happens, > > > the bug triggers and the fs goes read only. > > > > Rewrite? Or write into new inode? > > > > And is the same inode always causing the problem? > > It's not always the same. It seems like whatever triggers a write first > is what kills it. I went to test it, and this time it triggered on my > .bash_history file. I have bash set up with "history -a", so presumably > that was an append, not an overwrite. > > To cut down on the number of variables, I booted my system with the > "rescue" systemd target, then su'd to my user. Simply running a few > commands (with the history -a writes that bash triggered) was enough to > trigger the bug. This is on 4.8.0-rc6, with the following compile time > options enabled: > > CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y > CONFIG_BTRFS_DEBUG=y > CONFIG_BTRFS_ASSERT=y > > If I run the stock Arch kernel (4.7.2 at the moment), the issue still > appears, but it takes longer. My most reliable trigger is Firefox, whose > constant DB writes will trigger it within minutes. Is there anything I can do to help this along? I can build experimental patches, set up long running scripts, run tests, whatever is necessary. Thanks, --Sean ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-26 2:16 ` Sean Greenslade @ 2016-09-26 2:37 ` Qu Wenruo 0 siblings, 0 replies; 17+ messages in thread From: Qu Wenruo @ 2016-09-26 2:37 UTC (permalink / raw) To: Sean Greenslade; +Cc: Chris Murphy, David Sterba, Btrfs BTRFS At 09/26/2016 10:16 AM, Sean Greenslade wrote: > On Tue, Sep 20, 2016 at 04:51:21PM -0400, Sean Greenslade wrote: >> On Tue, Sep 20, 2016 at 01:02:38PM +0800, Qu Wenruo wrote: >>>> Glad to hear you've found the core of the issue. >>>> >>>> At this point, I can trigger it immediately. As soon as I log in and run >>>> dmenu, it will attempt to rebuild its cache file (small text file that's >>>> just a list of all executables in the PATH). Once that write happens, >>>> the bug triggers and the fs goes read only. >>> >>> Rewrite? Or write into new inode? >>> >>> And is the same inode always causing the problem? >> >> It's not always the same. It seems like whatever triggers a write first >> is what kills it. I went to test it, and this time it triggered on my >> .bash_history file. I have bash set up with "history -a", so presumably >> that was an append, not an overwrite. >> >> To cut down on the number of variables, I booted my system with the >> "rescue" systemd target, then su'd to my user. Simply running a few >> commands (with the history -a writes that bash triggered) was enough to >> trigger the bug. This is on 4.8.0-rc6, with the following compile time >> options enabled: >> >> CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y >> CONFIG_BTRFS_DEBUG=y >> CONFIG_BTRFS_ASSERT=y >> >> If I run the stock Arch kernel (4.7.2 at the moment), the issue still >> appears, but it takes longer. My most reliable trigger is Firefox, whose >> constant DB writes will trigger it within minutes. > > Is there anything I can do to help this along? I can build experimental > patches, set up long running scripts, run tests, whatever is necessary. Unfortunately, I'm afraid that we need to dig the source to locate problem, not anytime soon to find the root cause. Thanks, Qu > > Thanks, > > --Sean > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-16 19:25 Post ext3 conversion problems Sean Greenslade 2016-09-16 20:23 ` Chris Murphy @ 2016-09-17 2:27 ` Liu Bo 2016-09-17 4:16 ` Sean Greenslade 1 sibling, 1 reply; 17+ messages in thread From: Liu Bo @ 2016-09-17 2:27 UTC (permalink / raw) To: Sean Greenslade; +Cc: linux-btrfs On Fri, Sep 16, 2016 at 03:25:00PM -0400, Sean Greenslade wrote: > Hi, all. I've been playing around with an old laptop of mine, and I > figured I'd use it as a learning / bugfinding opportunity. Its /home > partition was originally ext3. I have a full partition image of this > drive as a backup, so I can do (and have done) potentially destructive > things. The system disk is a ~6 year old SSD. > > To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1) > and ran a simple btrfs-convert on it. After patching up the fstab and > rebooting, everything seemed fine. I deleted the recovery subvol, ran a > full balance, ran a full defrag, and rebooted again. I then decided to > try (as an experiment) using DUP mode for data and metadata. I ran that > balance without issue, then started using the machine. Sometime later, I > got the following remount ro: > > [ 7316.764235] ------------[ cut here ]------------ > [ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs] > [ 7316.764297] BTRFS: Transaction aborted (error -95) > [ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media crc32c_generic iTCO_wdt btrfs iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common ath9k_hw ath mac80211 snd_hda_codec_realtek snd_hda_codec_generic psmouse input_leds coretemp snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 snd_hwdep snd_hda_core snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore shpchp intel_agp wmi thermal fjes battery evdev ac tpm_tis mac_hid tpm sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop sg acpi_cpufreq ip_tables x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw atkbd libps2 ahci libahci uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore > [ 7316.764434] usb_common i8042 serio i915 video button intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm > [ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G O 4.7.3-5-ck #1 > [ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903 11/08/2010 > [ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] > [ 7316.764513] 0000000000000286 000000006101f47d ffff8800230dbc78 ffffffff812f0215 > [ 7316.764522] ffff8800230dbcc8 0000000000000000 ffff8800230dbcb8 ffffffff8107ae6f > [ 7316.764530] 00000b8a00000035 ffff88007791afa8 ffff8800751d9000 ffff880014101d40 > [ 7316.764538] Call Trace: > [ 7316.764551] [<ffffffff812f0215>] dump_stack+0x63/0x8e > [ 7316.764560] [<ffffffff8107ae6f>] __warn+0xcf/0xf0 > [ 7316.764567] [<ffffffff8107aef1>] warn_slowpath_fmt+0x61/0x80 > [ 7316.764605] [<ffffffffa07aa362>] ? unpin_extent_cache+0xa2/0xf0 [btrfs] > [ 7316.764640] [<ffffffffa07628e6>] ? btrfs_free_path+0x26/0x30 [btrfs] > [ 7316.764677] [<ffffffffa079aaac>] btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs] > [ 7316.764715] [<ffffffffa079adc5>] finish_ordered_fn+0x15/0x20 [btrfs] > [ 7316.764753] [<ffffffffa07c5f8e>] btrfs_scrubparity_helper+0x7e/0x360 [btrfs] > [ 7316.764791] [<ffffffffa07c62fe>] btrfs_endio_write_helper+0xe/0x10 [btrfs] > [ 7316.764799] [<ffffffff810949bd>] process_one_work+0x1ed/0x490 > [ 7316.764806] [<ffffffff81094ca9>] worker_thread+0x49/0x500 > [ 7316.764813] [<ffffffff81094c60>] ? process_one_work+0x490/0x490 > [ 7316.764820] [<ffffffff8109ac3a>] kthread+0xda/0xf0 > [ 7316.764830] [<ffffffff815c553f>] ret_from_fork+0x1f/0x40 > [ 7316.764838] [<ffffffff8109ab60>] ? kthread_worker_fn+0x170/0x170 > [ 7316.764843] ---[ end trace 90f54effc5e294b0 ]--- > [ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: errno=-95 unknown > [ 7316.764859] BTRFS info (device sda2): forced readonly > [ 7316.765396] pending csums is 9437184 > > After seeing this, I decided to attempt a repair (confident that I could > restore from backup if it failed). At the time, I was unaware of the > issues with progs 4.7.1, so when I ran the check and saw all the > incorrect backrefs messages, I figured that was my problem and ran the > --repair. Of course, this didn't make the messages go away on subsequent > checks, so I looked further and found this bug: > > https://bugzilla.kernel.org/show_bug.cgi?id=155791 > > I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of > the logs from these, unfortunately). The repair seemed to work (I also > used --init-extent-tree), as current checks don't report any errors. > > The system boots and mounts the FS just fine. I can read from it all > day, scrubs complete without failure, but just using the system for a > while will eventually trigger the same "Transaction aborted (error -95)" > error. > > I realize this is something of a mess, and that I was less than > methodical with my actions so far. Given that I have a full backup that > can be restored if need be (and I certainly could try running the > convert again), what is my best course of action? Interesting, seems that we get errors from btrfs_finish_ordered_io insert_reserved_file_extent __btrfs_drop_extents And splitting an inline extent throws -95. Thanks, -liubo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Post ext3 conversion problems 2016-09-17 2:27 ` Liu Bo @ 2016-09-17 4:16 ` Sean Greenslade 0 siblings, 0 replies; 17+ messages in thread From: Sean Greenslade @ 2016-09-17 4:16 UTC (permalink / raw) To: Liu Bo; +Cc: linux-btrfs On Fri, Sep 16, 2016 at 07:27:58PM -0700, Liu Bo wrote: > Interesting, seems that we get errors from > > btrfs_finish_ordered_io > insert_reserved_file_extent > __btrfs_drop_extents > > And splitting an inline extent throws -95. Heh, you beat me to the draw. I was just coming to the same conclusion myself from poking at the source code. What's interesting is that it seems to be a quite explicit thing: if (extent_type == BTRFS_FILE_EXTENT_INLINE) { ret = -EOPNOTSUPP; break; } So now the question is why is this happening? Clearly the presence of inline extents isn't an issue by itself, since another one of my btrfs /home partitions has plenty of them. I added some debug prints to my kernel to catch the inode that tripped the error. Here's the relevant chunk (with filenames scrubbed) from btrfs-debug-tree: Inode 140345 triggered the transaction abort. leaf 175131459584 items 51 free space 7227 generation 118521 owner 5 fs uuid 1d9ee7c7-d13a-4c3c-b730-256c70841c5b chunk uuid b67a1a82-ff22-48b5-af1b-9d5f85ebee25 item 0 key (140343 INODE_ITEM 0) itemoff 16123 itemsize 160 inode generation 1 transid 1 size 180 nbytes 0 block group 0 mode 40755 links 1 uid 1000 gid 1000 rdev 0 flags 0x0(none) item 1 key (140343 INODE_REF 131327) itemoff 16107 itemsize 16 inode ref index 199 namelen 6 name: <scrubbed> item 2 key (140343 DIR_ITEM 1073386496) itemoff 16072 itemsize 35 location key (142600 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: <scrubbed> item 3 key (140343 DIR_ITEM 1148422723) itemoff 16037 itemsize 35 location key (142601 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: <scrubbed> item 4 key (140343 DIR_ITEM 2415965623) itemoff 16004 itemsize 33 location key (131550 INODE_ITEM 0) type SYMLINK namelen 3 datalen 0 name: <scrubbed> item 5 key (140343 DIR_ITEM 2448077466) itemoff 15965 itemsize 39 location key (140565 INODE_ITEM 0) type FILE namelen 9 datalen 0 name: <scrubbed> item 6 key (140343 DIR_ITEM 2566671093) itemoff 15930 itemsize 35 location key (140564 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: <scrubbed> item 7 key (140343 DIR_ITEM 3391512089) itemoff 15873 itemsize 57 location key (142599 INODE_ITEM 0) type FILE namelen 27 datalen 0 name: <scrubbed> item 8 key (140343 DIR_ITEM 3621719155) itemoff 15838 itemsize 35 location key (131627 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: <scrubbed> item 9 key (140343 DIR_ITEM 3701680574) itemoff 15798 itemsize 40 location key (142603 INODE_ITEM 0) type FIFO namelen 10 datalen 0 name: <scrubbed> item 10 key (140343 DIR_ITEM 3816117430) itemoff 15763 itemsize 35 location key (140563 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: <scrubbed> item 11 key (140343 DIR_ITEM 4214885080) itemoff 15729 itemsize 34 location key (131544 INODE_ITEM 0) type SYMLINK namelen 4 datalen 0 name: <scrubbed> item 12 key (140343 DIR_ITEM 4253409616) itemoff 15687 itemsize 42 location key (140352 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: <scrubbed> item 13 key (140343 DIR_INDEX 2) itemoff 15653 itemsize 34 location key (131544 INODE_ITEM 0) type SYMLINK namelen 4 datalen 0 name: <scrubbed> item 14 key (140343 DIR_INDEX 3) itemoff 15620 itemsize 33 location key (131550 INODE_ITEM 0) type SYMLINK namelen 3 datalen 0 name: <scrubbed> item 15 key (140343 DIR_INDEX 4) itemoff 15585 itemsize 35 location key (131627 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: <scrubbed> item 16 key (140343 DIR_INDEX 5) itemoff 15543 itemsize 42 location key (140352 INODE_ITEM 0) type FILE namelen 12 datalen 0 name: <scrubbed> item 17 key (140343 DIR_INDEX 6) itemoff 15508 itemsize 35 location key (140563 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: <scrubbed> item 18 key (140343 DIR_INDEX 7) itemoff 15473 itemsize 35 location key (140564 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: <scrubbed> item 19 key (140343 DIR_INDEX 8) itemoff 15434 itemsize 39 location key (140565 INODE_ITEM 0) type FILE namelen 9 datalen 0 name: <scrubbed> item 20 key (140343 DIR_INDEX 9) itemoff 15377 itemsize 57 location key (142599 INODE_ITEM 0) type FILE namelen 27 datalen 0 name: <scrubbed> item 21 key (140343 DIR_INDEX 10) itemoff 15342 itemsize 35 location key (142600 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: <scrubbed> item 22 key (140343 DIR_INDEX 11) itemoff 15307 itemsize 35 location key (142601 INODE_ITEM 0) type SYMLINK namelen 5 datalen 0 name: <scrubbed> item 23 key (140343 DIR_INDEX 12) itemoff 15267 itemsize 40 location key (142603 INODE_ITEM 0) type FIFO namelen 10 datalen 0 name: <scrubbed> item 24 key (140344 INODE_ITEM 0) itemoff 15107 itemsize 160 inode generation 1 transid 3804 size 2779 nbytes 2779 block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0 flags 0x0(none) item 25 key (140344 INODE_REF 131327) itemoff 15088 itemsize 19 inode ref index 7 namelen 9 name: <scrubbed> item 26 key (140344 EXTENT_DATA 0) itemoff 12288 itemsize 2800 inline extent data size 2779 ram 2779 compress(none) item 27 key (140345 INODE_ITEM 0) itemoff 12128 itemsize 160 inode generation 1 transid 3812 size 53212 nbytes 53248 block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0 flags 0x0(none) item 28 key (140345 INODE_REF 147957) itemoff 12109 itemsize 19 inode ref index 17 namelen 9 name: <scrubbed> item 29 key (140345 EXTENT_DATA 0) itemoff 12056 itemsize 53 extent data disk byte 189201358848 nr 53248 extent data offset 0 nr 53248 ram 53248 extent compression(none) item 30 key (140347 INODE_ITEM 0) itemoff 11896 itemsize 160 inode generation 1 transid 1 size 89666 nbytes 90112 block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0 flags 0x0(none) item 31 key (140347 INODE_REF 131327) itemoff 11878 itemsize 18 inode ref index 140 namelen 8 name: <scrubbed> item 32 key (140347 EXTENT_DATA 0) itemoff 11825 itemsize 53 extent data disk byte 154930053120 nr 90112 extent data offset 0 nr 90112 ram 90112 extent compression(none) item 33 key (140348 INODE_ITEM 0) itemoff 11665 itemsize 160 inode generation 1 transid 1 size 27 nbytes 28 block group 0 mode 120777 links 1 uid 1000 gid 1000 rdev 0 flags 0x0(none) item 34 key (140348 INODE_REF 131327) itemoff 11646 itemsize 19 inode ref index 197 namelen 9 name: <scrubbed> item 35 key (140348 EXTENT_DATA 0) itemoff 11597 itemsize 49 inline extent data size 28 ram 28 compress(none) item 36 key (140349 INODE_ITEM 0) itemoff 11437 itemsize 160 inode generation 1 transid 1 size 37963 nbytes 40960 block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0 flags 0x0(none) item 37 key (140349 INODE_REF 180481) itemoff 11408 itemsize 29 inode ref index 2 namelen 19 name: <scrubbed> item 38 key (140349 EXTENT_DATA 0) itemoff 11355 itemsize 53 extent data disk byte 157136482304 nr 180224 extent data offset 139264 nr 40960 ram 180224 extent compression(none) item 39 key (140352 INODE_ITEM 0) itemoff 11195 itemsize 160 inode generation 1 transid 1 size 826 nbytes 826 block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0 flags 0x0(none) item 40 key (140352 INODE_REF 140343) itemoff 11173 itemsize 22 inode ref index 5 namelen 12 name: <scrubbed> item 41 key (140352 EXTENT_DATA 0) itemoff 10326 itemsize 847 inline extent data size 826 ram 826 compress(none) item 42 key (140354 INODE_ITEM 0) itemoff 10166 itemsize 160 inode generation 1 transid 1 size 1032 nbytes 1032 block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0 flags 0x0(none) item 43 key (140354 INODE_REF 246185) itemoff 10112 itemsize 54 inode ref index 301 namelen 44 name: <scrubbed> item 44 key (140354 EXTENT_DATA 0) itemoff 9059 itemsize 1053 inline extent data size 1032 ram 1032 compress(none) item 45 key (140357 INODE_ITEM 0) itemoff 8899 itemsize 160 inode generation 1 transid 1 size 0 nbytes 0 block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0 flags 0x0(none) item 46 key (140357 INODE_REF 247133) itemoff 8865 itemsize 34 inode ref index 18 namelen 24 name: <scrubbed> item 47 key (140358 INODE_ITEM 0) itemoff 8705 itemsize 160 inode generation 1 transid 3811 size 0 nbytes 0 block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0 flags 0x0(none) item 48 key (140358 INODE_REF 140136) itemoff 8686 itemsize 19 inode ref index 3 namelen 9 name: <scrubbed> item 49 key (140359 INODE_ITEM 0) itemoff 8526 itemsize 160 inode generation 1 transid 3812 size 1150 nbytes 1150 block group 0 mode 100644 links 1 uid 1000 gid 1000 rdev 0 flags 0x0(none) item 50 key (140359 INODE_REF 147967) itemoff 8502 itemsize 24 inode ref index 11 namelen 14 name: <scrubbed> --Sean ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2016-09-26 2:37 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-09-16 19:25 Post ext3 conversion problems Sean Greenslade 2016-09-16 20:23 ` Chris Murphy 2016-09-16 23:25 ` Sean Greenslade 2016-09-16 23:45 ` Chris Murphy 2016-09-17 0:03 ` Sean Greenslade 2016-09-19 2:20 ` Qu Wenruo 2016-09-19 4:12 ` Sean Greenslade 2016-09-19 6:30 ` Qu Wenruo 2016-09-19 15:13 ` Sean Greenslade 2016-09-20 2:49 ` Qu Wenruo 2016-09-20 3:39 ` Sean Greenslade 2016-09-20 5:02 ` Qu Wenruo 2016-09-20 20:51 ` Sean Greenslade 2016-09-26 2:16 ` Sean Greenslade 2016-09-26 2:37 ` Qu Wenruo 2016-09-17 2:27 ` Liu Bo 2016-09-17 4:16 ` Sean Greenslade
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).