All of lore.kernel.org
 help / color / mirror / Atom feed
From: Liu Bo <bo.li.liu@oracle.com>
To: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: linux-btrfs@vger.kernel.org, Zach Brown <zab@zabbo.net>
Subject: Re: Linux Arch: kernel BUG at fs/btrfs/inode.c:873!
Date: Tue, 8 Oct 2013 15:12:33 +0800	[thread overview]
Message-ID: <20131008071232.GA1402@localhost.localdomain> (raw)
In-Reply-To: <CAOMFOmX95ebV=myVO1yDf5yPs7DfmvHDKDFwNNfT0CajDQ=W4Q@mail.gmail.com>

On Mon, Oct 07, 2013 at 11:36:30PM -0700, Anatol Pomozov wrote:
> Hi, Btrfs developers
> 
> 
> On Fri, Oct 4, 2013 at 9:03 PM, Anatol Pomozov <anatol.pomozov@gmail.com> wrote:
> > Hi,
> >
> > I have a home server on Linux Arch (kernel 3.11.2) that uses
> > multi-device btrfs on root filesystem.
> >
> > Until recently it worked completely fine. And yesterday I rebooted it
> > and the machine did not wake up.
> >
> > I booted from a USB (kernel 3.10) and tried to mount the filesystem.
> > Here is OOPs I see
> >
> > [   41.676217] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
> > 8 transid 164237 /dev/sda
> > [   41.684161] btrfs: disk space caching is enabled
> > [   67.266742] BTRFS error (device sdd3): block group 1141416919040
> > has wrong amount of free space
> > [   67.266796] BTRFS error (device sdd3): failed to load free space
> > cache for block group 1141416919040
> > [   68.126102] ------------[ cut here ]------------
> > [   68.126138] kernel BUG at fs/btrfs/inode.c:873!
> > [   68.126164] invalid opcode: 0000 [#1] PREEMPT SMP
> > [   68.126203] Modules linked in: intel_powerclamp coretemp kvm_intel
> > kvm crc32_pclmul ghash_clmulni_intel cryptd iTCO_wdt
> > iTCO_vendor_support ppdev microcode snd_hda_codec_hdmi psmouse
> > snd_hda_codec_realtek serio_raw i2c_i801 snd_hda_intel pcspkr
> > snd_hda_codec lpc_ich snd_hwdep parport_pc parport snd_pcm mperf
> > snd_page_alloc snd_timer snd mei_me soundcore evdev mei processor nfs
> > lockd sunrpc fscache ext4 crc16 mbcache jbd2 dm_snapshot dm_mod
> > squashfs loop isofs btrfs raid6_pq libcrc32c zlib_deflate xor
> > hid_generic usbhid hid usb_storage sd_mod i915 intel_agp intel_gtt
> > ahci libahci crc32c_intel i2c_algo_bit xhci_hcd libata ehci_pci
> > ehci_hcd scsi_mod atl1c drm_kms_helper usbcore usb_common drm i2c_core
> > button video
> > [   68.126754] CPU: 1 PID: 386 Comm: mount Not tainted 3.10.10-1-ARCH #1
> > [   68.126787] Hardware name: To Be Filled By O.E.M. To Be Filled By
> > O.E.M./H61M/U3S3, BIOS P2.20 07/30/2012
> > [   68.126834] task: ffff880118869950 ti: ffff88011377e000 task.ti:
> > ffff88011377e000
> > [   68.126871] RIP: 0010:[<ffffffffa0471223>]  [<ffffffffa0471223>]
> > __cow_file_range+0x3e3/0x460 [btrfs]
> > [   68.126933] RSP: 0018:ffff88011377f328  EFLAGS: 00010206
> > [   68.126961] RAX: 00000000000004d2 RBX: 0000000000000000 RCX: 0000000000001000
> > [   68.126996] RDX: 00000000000004d2 RSI: ffff88001f438608 RDI: ffff880115eb3000
> > [   68.127032] RBP: ffff88011377f3c8 R08: 0000000000000000 R09: 000000000003ffff
> > [   68.127068] R10: 0000000000040000 R11: 0000000000000000 R12: 0000000000000000
> > [   68.127103] R13: ffff880115f88630 R14: ffff88001f438608 R15: 0000000000000000
> > [   68.127140] FS:  00007fac17768780(0000) GS:ffff88011f300000(0000)
> > knlGS:0000000000000000
> > [   68.127180] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   68.127209] CR2: 00007f518d994000 CR3: 0000000117ab4000 CR4: 00000000000407e0
> > [   68.127246] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [   68.127281] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [   68.127317] Stack:
> > [   68.127331]  00000109ffe26000 ffff880115f88c60 ffff88001f438428
> > 000000000003ffff
> > [   68.127381]  ffff88011700c010 ffffea0003231b40 ffff880115eb3000
> > f600000109ffd870
> > [   68.127430]  ffffffffa0482f29 ffff880118a31000 ffff880115f88638
> > ffff88001f438448
> > [   68.127480] Call Trace:
> > [   68.127508]  [<ffffffffa0482f29>] ? release_extent_buffer+0xa9/0xd0 [btrfs]
> > [   68.127553]  [<ffffffffa048862f>] ? free_extent_buffer+0x4f/0xa0 [btrfs]
> > [   68.127598]  [<ffffffffa04716d6>] run_delalloc_nocow+0x436/0xaf0 [btrfs]
> > [   68.127641]  [<ffffffffa0472180>] run_delalloc_range+0x320/0x390 [btrfs]
> > [   68.127685]  [<ffffffffa04854c1>] ?
> > find_lock_delalloc_range.constprop.44+0x1d1/0x1f0 [btrfs]
> > [   68.127735]  [<ffffffffa0487044>] __extent_writepage+0x354/0x7b0 [btrfs]
> > [   68.127772]  [<ffffffff81122645>] ? find_get_pages_tag+0x105/0x180
> > [   68.127813]  [<ffffffffa0487722>]
> > extent_write_cache_pages.isra.32.constprop.48+0x282/0x3e0 [btrfs]
> > [   68.127867]  [<ffffffffa0487b7d>] extent_writepages+0x4d/0x70 [btrfs]
> > [   68.127909]  [<ffffffffa046e080>] ? can_nocow_odirect+0x2f0/0x2f0 [btrfs]
> > [   68.127951]  [<ffffffffa046cf28>] btrfs_writepages+0x28/0x30 [btrfs]
> > [   68.127985]  [<ffffffff8112e28e>] do_writepages+0x1e/0x40
> > [   68.128014]  [<ffffffff81123669>] __filemap_fdatawrite_range+0x59/0x60
> > [   68.128048]  [<ffffffff81123733>] filemap_fdatawrite_range+0x13/0x20
> > [   68.128090]  [<ffffffffa0481c99>] btrfs_wait_ordered_range+0x49/0x110 [btrfs]
> > [   68.128135]  [<ffffffffa04a64c0>] __btrfs_write_out_cache+0x6d0/0x8f0 [btrfs]
> > [   68.128180]  [<ffffffffa04a774d>] btrfs_write_out_cache+0x8d/0xe0 [btrfs]
> > [   68.128224]  [<ffffffffa0459983>]
> > btrfs_write_dirty_block_groups+0x533/0x620 [btrfs]
> > [   68.128271]  [<ffffffffa04676e2>] commit_cowonly_roots+0x172/0x260 [btrfs]
> > [   68.128314]  [<ffffffffa04695ad>]
> > btrfs_commit_transaction+0x5bd/0xaf0 [btrfs]
> > [   68.128353]  [<ffffffff8107b460>] ? wake_up_bit+0x30/0x30
> > [   68.128391]  [<ffffffffa04a4edd>] btrfs_recover_log_trees+0x3bd/0x490 [btrfs]
> > [   68.128434]  [<ffffffffa04a3270>] ? replay_one_dir_item+0xf0/0xf0 [btrfs]
> > [   68.128477]  [<ffffffffa0466689>] open_ctree+0x17b9/0x1e80 [btrfs]
> > [   68.128513]  [<ffffffff813555d3>] ? proc_comm_connector+0x33/0x120
> > [   68.128551]  [<ffffffffa043f456>] btrfs_mount+0x636/0x830 [btrfs]
> > [   68.128584]  [<ffffffff81141cd2>] ? pcpu_alloc+0x7d2/0x9e0
> > [   68.128616]  [<ffffffff8118f5d9>] mount_fs+0x39/0x1b0
> > [   68.128643]  [<ffffffff81141ef0>] ? __alloc_percpu+0x10/0x20
> > [   68.128676]  [<ffffffff811a8c27>] vfs_kern_mount+0x67/0x100
> > [   68.128706]  [<ffffffff811ab24e>] do_mount+0x23e/0xa20
> > [   68.128737]  [<ffffffff8113d73b>] ? strndup_user+0x4b/0xf0
> > [   68.128766]  [<ffffffff811abab3>] SyS_mount+0x83/0xc0
> > [   68.128795]  [<ffffffff814cfe1d>] system_call_fastpath+0x1a/0x1f
> > [   68.128826] Code: 8b 7d 90 4c 89 f6 e8 ad 9e 00 00 e9 dc fc ff ff
> > 48 85 d2 74 40 80 be 30 fe ff ff 84 48 89 d0 74 34 48 83 f8 01 0f 84
> > 87 fc ff ff <0f> 0b 48 8b 75 a8 48 8b 7d 90 41 89 c0 b9 9b 03 00 00 48
> > c7 c2
> > [   68.129145] RIP  [<ffffffffa0471223>] __cow_file_range+0x3e3/0x460 [btrfs]
> > [   68.129192]  RSP <ffff88011377f328>
> > [   68.129230] ---[ end trace 7992880786c40076 ]---
> >
> >
> >
> > Hm.... it looks like it crashed when it tries to restore logs. Ok, I
> > ran 'btrfschk /dev/sda' and here is its output:
> >
> > [  181.281546] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
> > 8 transid 164237 /dev/sda
> > [  181.318148] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
> > 3 transid 164237 /dev/sdb
> > [  181.408490] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
> > 2 transid 164237 /dev/sdc1
> > [  181.763300] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
> > 9 transid 164237 /dev/sdd3
> > [  181.782414] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
> > 8 transid 164237 /dev/sda
> > [  181.784634] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
> > 3 transid 164237 /dev/sdb
> > [  181.788715] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
> > 2 transid 164237 /dev/sdc1
> > [  181.803161] device fsid 25e6a6fa-fe1f-4be5-a638-eeac948f8c21 devid
> > 9 transid 164237 /dev/sdd3
> > [  337.445525] systemd-journald[160]: Failed to write entry, ignoring:
> > Argument list too long
> >
> >
> > Then I tried to mount the filesystem again and it stuck. I see several
> > processes in UNINTERRUPTIBLE state:
> >
> > [  828.034908] SysRq : Show Blocked State
> > [  828.036150]   task                        PC stack   pid father
> > [  828.037418] btrfs-transacti D ffff880118a311e8     0   407      2 0x00000000
> > [  828.038709]  ffff880115051dc8 0000000000000046 0000000000014340
> > ffff880115051fd8
> > [  828.040030]  ffff880115051fd8 0000000000014340 ffff88011886f620
> > ffff88011886f620
> > [  828.041364]  000000000000082f ffff880115051d30 ffffffff8109d141
> > ffff880118868048
> > [  828.042724] Call Trace:
> > [  828.044071]  [<ffffffff8109d141>] ? cpuacct_charge+0x61/0x70
> > [  828.045445]  [<ffffffff8108dd88>] ? __enqueue_entity+0x78/0x80
> > [  828.046829]  [<ffffffff810920f6>] ? enqueue_entity+0x286/0xa20
> > [  828.048227]  [<ffffffff81065bbb>] ? lock_timer_base.isra.35+0x2b/0x50
> > [  828.049648]  [<ffffffff814c6f09>] schedule+0x29/0x70
> > [  828.051076]  [<ffffffffa0468695>]
> > wait_current_trans.isra.14+0xa5/0xf0 [btrfs]
> > [  828.052515]  [<ffffffff8107b460>] ? wake_up_bit+0x30/0x30
> > [  828.053970]  [<ffffffffa0469e18>] start_transaction+0x338/0x530 [btrfs]
> > [  828.055453]  [<ffffffffa046a0c7>] btrfs_attach_transaction+0x17/0x20 [btrfs]
> > [  828.056943]  [<ffffffffa0460ca1>] transaction_kthread+0x141/0x230 [btrfs]
> > [  828.058455]  [<ffffffffa0460b60>] ? free_fs_root+0x90/0x90 [btrfs]
> > [  828.059989]  [<ffffffff8107a670>] kthread+0xc0/0xd0
> > [  828.061508]  [<ffffffff8107a5b0>] ? kthread_create_on_node+0x120/0x120
> > [  828.063036]  [<ffffffff814cfd6c>] ret_from_fork+0x7c/0xb0
> > [  828.064568]  [<ffffffff8107a5b0>] ? kthread_create_on_node+0x120/0x120
> > [  828.066109] mount           D ffff880118a31878     0   427    343 0x00000000
> > [  828.067680]  ffff880117bdbc18 0000000000000086 0000000000014340
> > ffff880117bdbfd8
> > [  828.069278]  ffff880117bdbfd8 0000000000014340 ffff880118facbf0
> > ffff8801169cb800
> > [  828.070869]  ffff88011f214340 ffffffff81085aa9 ffff8801169cb800
> > ffff88011f214340
> > [  828.072430] Call Trace:
> > [  828.073966]  [<ffffffff81085aa9>] ? finish_task_switch+0x49/0xe0
> > [  828.075524]  [<ffffffff814c6996>] ? __schedule+0x3f6/0x940
> > [  828.077093]  [<ffffffff81071194>] ? wake_up_worker+0x24/0x30
> > [  828.078671]  [<ffffffff814c5a34>] ? __mutex_lock_slowpath+0x284/0x3b0
> > [  828.080275]  [<ffffffff814c6f09>] schedule+0x29/0x70
> > [  828.081867]  [<ffffffff814c7d95>] rwsem_down_write_failed+0xf5/0x1c3
> > [  828.083473]  [<ffffffffa043a000>] ? 0xffffffffa0439fff
> > [  828.085083]  [<ffffffff81279e33>] call_rwsem_down_write_failed+0x13/0x20
> > [  828.086717]  [<ffffffff814c5ec4>] ? down_write+0x24/0x26
> > [  828.088366]  [<ffffffff8118dffe>] grab_super+0x2e/0xa0
> > [  828.090021]  [<ffffffff8118e6f0>] sget+0x320/0x580
> > [  828.091677]  [<ffffffffa043e130>] ?
> > btrfs_parse_early_options+0x190/0x190 [btrfs]
> > [  828.093368]  [<ffffffff814c57ae>] ? mutex_unlock+0xe/0x10
> > [  828.095068]  [<ffffffffa043f228>] btrfs_mount+0x408/0x830 [btrfs]
> > [  828.096786]  [<ffffffff81141cd2>] ? pcpu_alloc+0x7d2/0x9e0
> > [  828.098516]  [<ffffffff8118f5d9>] mount_fs+0x39/0x1b0
> > [  828.100267]  [<ffffffff81141ef0>] ? __alloc_percpu+0x10/0x20
> > [  828.102017]  [<ffffffff811a8c27>] vfs_kern_mount+0x67/0x100
> > [  828.103775]  [<ffffffff811ab24e>] do_mount+0x23e/0xa20
> > [  828.105547]  [<ffffffff8113d73b>] ? strndup_user+0x4b/0xf0
> > [  828.107312]  [<ffffffff811abab3>] SyS_mount+0x83/0xc0
> > [  828.109096]  [<ffffffff814cfe1d>] system_call_fastpath+0x1a/0x1f
> >
> > The only thing that I did recently is defrag /var/log/journal files
> > (journalctl is very slow because of btrfs COW). Something like this
> > http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg24878.html
> >
> > How to fix this problem? And restore the data...
> 
> 
> Dear btrfs developers, I still have my server in broken state. Is
> there anything I can do to restore it? I tried to mount the filesystem
> with different flags (e.g. notreelog) but all of them cause deadlock
> like above. Have you see such issue before? Any ideas what the problem
> can be?
> 
> I do not mind to spend some time on debugging this kernel issue it but
> I really need some pointers from people who know this code very well
> (I am mostly familiar with block layer).

Getting Stuck while sget() doesn't seem to be a problem caused by zeroing the
log.

Is it possible to boot it with a USB(recent 3.12-rc3) and mount it again?

-liubo

  reply	other threads:[~2013-10-08  7:12 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-05  4:03 Linux Arch: kernel BUG at fs/btrfs/inode.c:873! Anatol Pomozov
2013-10-05  4:42 ` Duncan
2013-10-05 11:51   ` Anatol Pomozov
2013-10-05 14:44     ` Duncan
2013-10-06  5:14       ` Anatol Pomozov
2013-10-06 10:10         ` Duncan
2013-10-08  6:36 ` Anatol Pomozov
2013-10-08  7:12   ` Liu Bo [this message]
2013-10-12  2:22     ` Anatol Pomozov
2013-10-12 21:20       ` Chris Murphy
2013-10-15 19:56       ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131008071232.GA1402@localhost.localdomain \
    --to=bo.li.liu@oracle.com \
    --cc=anatol.pomozov@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=zab@zabbo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.