From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp1040.oracle.com ([141.146.126.69]:21700 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750774AbcIQCWf (ORCPT ); Fri, 16 Sep 2016 22:22:35 -0400 Date: Fri, 16 Sep 2016 19:27:58 -0700 From: Liu Bo To: Sean Greenslade Cc: linux-btrfs@vger.kernel.org Subject: Re: Post ext3 conversion problems Message-ID: <20160917022758.GA6389@localhost.localdomain> Reply-To: bo.li.liu@oracle.com References: <20160916192459.GA13514@coach.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20160916192459.GA13514@coach.home> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Fri, Sep 16, 2016 at 03:25:00PM -0400, Sean Greenslade wrote: > Hi, all. I've been playing around with an old laptop of mine, and I > figured I'd use it as a learning / bugfinding opportunity. Its /home > partition was originally ext3. I have a full partition image of this > drive as a backup, so I can do (and have done) potentially destructive > things. The system disk is a ~6 year old SSD. > > To start, I rebooted to a livedisk (Arch, kernel 4.7.2 w/progs 4.7.1) > and ran a simple btrfs-convert on it. After patching up the fstab and > rebooting, everything seemed fine. I deleted the recovery subvol, ran a > full balance, ran a full defrag, and rebooted again. I then decided to > try (as an experiment) using DUP mode for data and metadata. I ran that > balance without issue, then started using the machine. Sometime later, I > got the following remount ro: > > [ 7316.764235] ------------[ cut here ]------------ > [ 7316.764292] WARNING: CPU: 2 PID: 14196 at fs/btrfs/inode.c:2954 btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs] > [ 7316.764297] BTRFS: Transaction aborted (error -95) > [ 7316.764301] Modules linked in: fuse sha256_ssse3 sha256_generic hmac drbg ansi_cprng ctr ccm joydev mousedev uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media crc32c_generic iTCO_wdt btrfs iTCO_vendor_support arc4 xor ath9k raid6_pq ath9k_common ath9k_hw ath mac80211 snd_hda_codec_realtek snd_hda_codec_generic psmouse input_leds coretemp snd_hda_intel led_class pcspkr snd_hda_codec cfg80211 snd_hwdep snd_hda_core snd_pcm lpc_ich snd_timer atl1c rfkill snd soundcore shpchp intel_agp wmi thermal fjes battery evdev ac tpm_tis mac_hid tpm sch_fq_codel vboxnetflt(O) vboxnetadp(O) pci_stub vboxpci(O) vboxdrv(O) loop sg acpi_cpufreq ip_tables x_tables ext4 crc16 jbd2 mbcache sd_mod serio_raw atkbd libps2 ahci libahci uhci_hcd libata scsi_mod ehci_pci ehci_hcd usbcore > [ 7316.764434] usb_common i8042 serio i915 video button intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm > [ 7316.764462] CPU: 2 PID: 14196 Comm: kworker/u8:11 Tainted: G O 4.7.3-5-ck #1 > [ 7316.764467] Hardware name: ASUSTeK Computer INC. 1015PEM/1015PE, BIOS 0903 11/08/2010 > [ 7316.764507] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] > [ 7316.764513] 0000000000000286 000000006101f47d ffff8800230dbc78 ffffffff812f0215 > [ 7316.764522] ffff8800230dbcc8 0000000000000000 ffff8800230dbcb8 ffffffff8107ae6f > [ 7316.764530] 00000b8a00000035 ffff88007791afa8 ffff8800751d9000 ffff880014101d40 > [ 7316.764538] Call Trace: > [ 7316.764551] [] dump_stack+0x63/0x8e > [ 7316.764560] [] __warn+0xcf/0xf0 > [ 7316.764567] [] warn_slowpath_fmt+0x61/0x80 > [ 7316.764605] [] ? unpin_extent_cache+0xa2/0xf0 [btrfs] > [ 7316.764640] [] ? btrfs_free_path+0x26/0x30 [btrfs] > [ 7316.764677] [] btrfs_finish_ordered_io+0x6bc/0x6d0 [btrfs] > [ 7316.764715] [] finish_ordered_fn+0x15/0x20 [btrfs] > [ 7316.764753] [] btrfs_scrubparity_helper+0x7e/0x360 [btrfs] > [ 7316.764791] [] btrfs_endio_write_helper+0xe/0x10 [btrfs] > [ 7316.764799] [] process_one_work+0x1ed/0x490 > [ 7316.764806] [] worker_thread+0x49/0x500 > [ 7316.764813] [] ? process_one_work+0x490/0x490 > [ 7316.764820] [] kthread+0xda/0xf0 > [ 7316.764830] [] ret_from_fork+0x1f/0x40 > [ 7316.764838] [] ? kthread_worker_fn+0x170/0x170 > [ 7316.764843] ---[ end trace 90f54effc5e294b0 ]--- > [ 7316.764851] BTRFS: error (device sda2) in btrfs_finish_ordered_io:2954: errno=-95 unknown > [ 7316.764859] BTRFS info (device sda2): forced readonly > [ 7316.765396] pending csums is 9437184 > > After seeing this, I decided to attempt a repair (confident that I could > restore from backup if it failed). At the time, I was unaware of the > issues with progs 4.7.1, so when I ran the check and saw all the > incorrect backrefs messages, I figured that was my problem and ran the > --repair. Of course, this didn't make the messages go away on subsequent > checks, so I looked further and found this bug: > > https://bugzilla.kernel.org/show_bug.cgi?id=155791 > > I updated progs to 4.7.2 and re-ran the --repair (I didn't save any of > the logs from these, unfortunately). The repair seemed to work (I also > used --init-extent-tree), as current checks don't report any errors. > > The system boots and mounts the FS just fine. I can read from it all > day, scrubs complete without failure, but just using the system for a > while will eventually trigger the same "Transaction aborted (error -95)" > error. > > I realize this is something of a mess, and that I was less than > methodical with my actions so far. Given that I have a full backup that > can be restored if need be (and I certainly could try running the > convert again), what is my best course of action? Interesting, seems that we get errors from btrfs_finish_ordered_io insert_reserved_file_extent __btrfs_drop_extents And splitting an inline extent throws -95. Thanks, -liubo