From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cn.fujitsu.com ([222.73.24.84]:44374 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S933286Ab2GDFse (ORCPT ); Wed, 4 Jul 2012 01:48:34 -0400 Message-ID: <4FF3DB87.5090405@cn.fujitsu.com> Date: Wed, 04 Jul 2012 13:58:31 +0800 From: Liu Bo MIME-Version: 1.0 To: Marc MERLIN CC: linux-btrfs@vger.kernel.org Subject: Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction References: <20120626193637.GA27856@merlins.org> <20120627013818.GA3556@merlins.org> <20120627052012.GA32533@merlins.org> <20120629123624.GS7472@merlins.org> <20120702195820.GA10655@merlins.org> In-Reply-To: <20120702195820.GA10655@merlins.org> Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 07/03/2012 03:58 AM, Marc MERLIN wrote: > On Fri, Jun 29, 2012 at 05:36:24AM -0700, Marc MERLIN wrote: >> On Tue, Jun 26, 2012 at 10:20:12PM -0700, Marc MERLIN wrote: >>> On Tue, Jun 26, 2012 at 06:38:18PM -0700, Marc MERLIN wrote: >>>> Now, I'm also seeing these below and I have this again (86% CPU): >>>> 6076 root 20 0 0 0 0 R 86 0.0 29:40.11 btrfs-delalloc- >>>> >>>> How bad is it, doctor? I think I'll be going back to 3.2.16 for now though. >> >> I reverted to 3.2.16 and haven't had further problems after dropping the >> current snapshot that was corrupted in various ways. >> >> Now, I'm not sure when I should upgrade anymore since I haven't heard of >> any fixes for what I saw. >> Assuming I go forward again, is there something else I could have >> provided to help debug? > > Mmmh, ok. I understand that this code comes with no guarantees, and I have > backups, but I'm reporting a problem that lead to corruption (I had multiple > files that were corrupted in my latest snapshot and I had to drop it and > revert to an older snapshot and then out of fear for 3.4.4, went back to > 3.2.16). > Hi Marc, Sorry for not replying this earlier. The dmesg log, sysrq log and stack dump info can usually be very helpful. >>From your report, we can see the csum error and hang on log, 'no csum' is not that bad while hanging-on is serious and dangerous. so can you please get any 'sysrq + w' log in the hanging-on case and paste them here, and the log may tell us who blocks other threads. > I didn't see any problems with 3.2.16 (doesn't mean there weren't any, just > that I didn't see any). Feel free to use the latest btrfs upstream, it always contains some fixes. thanks, liubo > Since my filesystem was a bit full, and that triggers problems with btrfs, I > freed up 70GB > gandalfthegreat:~# btrfs fi show > Label: 'btrfs_pool1' uuid: 873d526c-e911-4234-af1b-239889cd143d > Total devices 1 FS bytes used 163.01GB > devid 1 size 231.02GB used 231.02GB path /dev/dm-0 > > I rebooted with 3.4.4 and started copying data, and for now I've gotten this: > kernel: [ 832.108558] btrfs no csum found for inode 3896855 start 0 > kernel: [ 832.108873] btrfs csum failed ino 3896855 off 0 csum 1150320628 private 0 > > How bad is this? > > More generally, what was missing from my previous report (I gave all the > sysrq I could output) that no one seemed to be able to use it? > > Thanks, > Marc > >>> Back to 3.2.16, I'm now seeing this: >>> [ 840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds. >>> [ 840.516735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> [ 840.516736] VirtualBox D ffff8801fd134080 0 6818 6758 0x00000080 >>> [ 840.516740] ffff8801fd134080 0000000000000086 0000000000000050 ffff880202e7f100 >>> [ 840.516744] 0000000000013580 ffff8801c6f0dfd8 ffff8801c6f0dfd8 ffff8801fd134080 >>> [ 840.516748] ffff8801c6f0da68 ffff8801c6f0da68 ffff88020a4e22f0 ffff88023bc13e08 >>> [ 840.516752] Call Trace: >>> [ 840.516755] [] ? __lock_page+0x66/0x66 >>> [ 840.516758] [] ? io_schedule+0x58/0x6f >>> [ 840.516761] [] ? sleep_on_page+0x6/0xa >>> [ 840.516764] [] ? __wait_on_bit_lock+0x3c/0x85 >>> [ 840.516767] [] ? __lock_page+0x61/0x66 >>> [ 840.516770] [] ? autoremove_wake_function+0x2a/0x2a >>> [ 840.516785] [] ? extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs] >>> [ 840.516789] [] ? __cache_free.isra.40+0x19/0x1a7 >>> [ 840.516792] [] ? sub_preempt_count+0x83/0x94 >>> [ 840.516795] [] ? _raw_spin_unlock+0x24/0x30 >>> [ 840.516811] [] ? extent_writepages+0x40/0x57 [btrfs] >>> [ 840.516826] [] ? __btrfs_buffered_write+0x2bb/0x2dc [btrfs] >>> [ 840.516841] [] ? uncompress_inline.isra.44+0x116/0x116 [btrfs] >>> [ 840.516844] [] ? __filemap_fdatawrite_range+0x4b/0x50 >>> [ 840.516847] [] ? filemap_write_and_wait_range+0x25/0x4d >>> [ 840.516863] [] ? btrfs_file_aio_write+0x34e/0x490 [btrfs] >>> [ 840.516866] [] ? get_parent_ip+0x9/0x1b >>> [ 840.516882] [] ? __btrfs_buffered_write+0x2dc/0x2dc [btrfs] >>> [ 840.516886] [] ? aio_rw_vect_retry+0x70/0x18e >>> [ 840.516888] [] ? aio_fsync+0x22/0x22 >>> [ 840.516891] [] ? aio_run_iocb+0x72/0x11c >>> [ 840.516894] [] ? do_io_submit+0x6a4/0x7f9 >>> [ 840.516898] [] ? system_call_fastpath+0x16/0x1b >>> [ 1187.553635] btrfs: unlinked 8 orphans >>> [ 3810.200064] e1000e 0000:00:19.0: BAR 0: set to [mem 0xfc000000-0xfc01ffff] (PCI address [0xfc000000-0xfc01ffff]) >>> [ 3810.200071] e1000e 0000:00:19.0: BAR 1: set to [mem 0xfc025000-0xfc025fff] (PCI address [0xfc025000-0xfc025fff]) >>> [ 3810.200076] e1000e 0000:00:19.0: BAR 2: set to [io 0x1840-0x185f] (PCI address [0x1840-0x185f]) >>> [ 3810.200093] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b) >>> [ 3810.200115] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107) >>> [ 3810.200147] e1000e 0000:00:19.0: PME# disabled >>> [ 3810.200224] e1000e 0000:00:19.0: irq 45 for MSI/MSI-X >>> [ 4671.144685] iwlwifi 0000:03:00.0: Tx aggregation enabled on ra = 2c:b0:5d:3c:7d:f1 tid = 1 >>> [ 4799.384107] btrfs: unlinked 8 orphans >>> [ 8436.512513] btrfs: unlinked 7 orphans >>> [11350.749850] btrfs no csum found for inode 3909426 start 0 >>> [11350.750697] btrfs csum failed ino 3909426 off 0 csum 1419704114 private 0 >>> [11652.088805] btrfs no csum found for inode 3910848 start 0 >>> [11652.089524] btrfs csum failed ino 3910848 off 0 csum 3145117582 private 0 >>> >>> My firefox and chrome profiles were corrupted, so I had to restore them from an old snapshot. >>> >>> I can't prove it, but it looks like my corruption happened right at the same >>> time than I rebooted to 3.4.4. >>> >>> Marc >>> -- >>> "A mouse is a device used to point at the xterm you want to type in" - A.S.R. >>> Microsoft is to operating systems .... >>> .... what McDonalds is to gourmet cooking >>> Home page: http://marc.merlins.org/ >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> "A mouse is a device used to point at the xterm you want to type in" - A.S.R. >> Microsoft is to operating systems .... >> .... what McDonalds is to gourmet cooking >> Home page: http://marc.merlins.org/ >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >