All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yaroslav Halchenko <yoh@onerussian.com>
To: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: recent complete stalls of btrfs (4.7.0-rc2+) -- any advice?
Date: Tue, 9 Aug 2016 18:19:52 -0400	[thread overview]
Message-ID: <20160809221951.GA26923@onerussian.com> (raw)
In-Reply-To: <20160612151531.GA28826@hopa.kiewit.dartmouth.edu>


On Sun, 12 Jun 2016, Yaroslav Halchenko wrote:
> On Fri, 10 Jun 2016, Chris Murphy wrote:

> > > Are those issues something which was fixed since 4.6.0-rc4+ or I should
> > > be on look out for them to come back?  What other information should I
> > > provide if I run into them again to help you troubleshoot/fix it?

> > > P.S. Please CC me the replies


> > 4.6.2 is current and it's a lot easier to just use that and see if it
> > still happens than for someone to track down whether it's been fixed
> > since a six week old RC.

> Dear Chris,

> Thank you for the reply!  Now running v4.7-rc2-300-g3d0f0b6

> The thing is that this issue doesn't happen right away, and it takes a
> while for it to develop, and seems to be only after an intensive load.
> So the version I run will always be "X weeks old" if I just keep hopping
> the recent release of master, and it would be an indefinite goose
> chase if left un-analyzed.  That is why I would still appreciate an
> advice on what specifics to report/attempt if such crash happens next
> time, or may be if someone is having an idea of what could have lead to
> this crash to start with.

The beast has died on me today's morning :-/  Last kern.log msg was

    (Fixing recursive fault but reboot is needed!)

One of the tracebacks is the same as before (ending on
btrfs_commit_transaction), so I guess it could be the same issue as
before?  Most probably I will perform the same kernel build/upgrade dance
again BUT I still hope that someone might just either spot some sign of
recently (since v4.7-rc2-300-g3d0f0b6) fixed issue or, if not spotted, actually
looks in detail on possibly a new issue which wasn't addressed yet.  I would be
"happy" to provide more information or enable any necessary additional
monitoring to provide more information in case of the next crash.

I have rebooted the box around 11am, and it was completely unresponsive since
some time earlier but I think it still "somewhat functioned" after the last
traceback reported in the kern.log which I shared at
http://www.onerussian.com/tmp/kern-smaug-20160809.log otherwise journalctl -b
-1 doesn't show any other grave errors.   The very last oops in the kern.log I
also cite here.  Out of academic interest?  why seems to be ext4 functionality
within the stack for btrfs_commit_transaction?  is some logic common/reused
between the two file systems?  Or it is just a mere fact that some partitions
on ext4 and something in btrfs triggered them as well?

Aug  9 07:46:15 smaug kernel: [5132590.362689] Oops: 0000 [#3] SMP
Aug  9 07:46:15 smaug kernel: [5132590.367913] Modules linked in: uas usb_storage vboxdrv(O) nls_utf8 ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs veth xt_addrtype ipt_MASQUERADE nf_nat_masquerade_ipv4 bridge stp llc cpufreq_stats cpufreq_userspace cpufreq_conservative cpufreq_powersave xt_pkttype nf_log_ipv4 nf_log_common xt_tcpudp ip6table_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_TCPMSS xt_LOG ipt_REJECT nf_reject_ipv4 iptable_mangle xt_multiport xt_state xt_limit xt_conntrack nfsd nf_conntrack_ftp auth_rpcgss oid_registry nfs_acl nfs lockd grace nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables x_tables fscache sunrpc binfmt_misc intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_watchdog ipmi_poweroff ipmi_devintf kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass fuse crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg ansi_cprng aesni_intel aes_x86_64 lrw gf128mul snd_pcm glue_helper ablk_helper cryptd snd_timer snd soundcore pcspkr evdev joydev ast ttm drm_kms_helper i2c_i801 drm i2c_algo_bit mei_me lpc_ich mfd_core mei ipmi_si ioatdma shpchp wmi ipmi_msghandler ecryptfs cbc tpm_tis tpm acpi_power_meter acpi_pad button sha256_ssse3 sha256_generic hmac encrypted_keys autofs4 ext4 crc16 jbd2 mbcache btrfs dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 md_mod ses enclosure sg sd_mod hid_generic usbhid hid crc32c_intel mpt3sas raid_class scsi_transport_sas xhci_pci xhci_hcd ehci_pci ahci ehci_hcd libahci libata usbcore ixgbe scsi_mod usb_common dca ptp pps_core mdio fjes
Aug  9 07:46:15 smaug kernel: [5132590.538375] CPU: 6 PID: 2878531 Comm: git Tainted: G      D W IO    4.7.0-rc2+ #1
Aug  9 07:46:15 smaug kernel: [5132590.547950] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0b 09/17/2014
Aug  9 07:46:15 smaug kernel: [5132590.557009] task: ffff8817b855b0c0 ti: ffff88000e0dc000 task.ti: ffff88000e0dc000
Aug  9 07:46:15 smaug kernel: [5132590.566572] RIP: 0010:[<ffffffffa0444be3>]  [<ffffffffa0444be3>] jbd2__journal_start+0x33/0x1e0 [jbd2]
Aug  9 07:46:15 smaug kernel: [5132590.578009] RSP: 0018:ffff88000e0df8f0  EFLAGS: 00010282
Aug  9 07:46:15 smaug kernel: [5132590.585427] RAX: ffff88155eae8140 RBX: ffff881ed5a9d128 RCX: 0000000002400040
Aug  9 07:46:15 smaug kernel: [5132590.594678] RDX: 00000000000fd0e4 RSI: 0000000000000002 RDI: ffff882034d0f000
Aug  9 07:46:15 smaug kernel: [5132590.603929] RBP: ffff882034d0f000 R08: 0000000000000001 R09: 0000000000001569
Aug  9 07:46:15 smaug kernel: [5132590.613264] R10: 00000000107aa8b7 R11: fffffffffffffff0 R12: ffff881ed5a9d128
Aug  9 07:46:15 smaug kernel: [5132590.622566] R13: ffff882033909000 R14: ffff881816302a00 R15: ffff881ed5a9d128
Aug  9 07:46:15 smaug kernel: [5132590.631846] FS:  0000000000000000(0000) GS:ffff88207fc80000(0000) knlGS:0000000000000000
Aug  9 07:46:15 smaug kernel: [5132590.642060] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug  9 07:46:15 smaug kernel: [5132590.649898] CR2: 00000000000fd0e4 CR3: 0000000001a06000 CR4: 00000000001406e0
Aug  9 07:46:15 smaug kernel: [5132590.659130] Stack:
Aug  9 07:46:15 smaug kernel: [5132590.663228]  ffffffffa049cc54 0000156902020200 ffff881ed5a9d128 0000000000000801
Aug  9 07:46:15 smaug kernel: [5132590.672811]  ffff881ed5a9d128 ffff882033909000 ffff881816302a00 ffff881ed5a9d128
Aug  9 07:46:15 smaug kernel: [5132590.682392]  ffffffffa0470b9d ffff881ed5a9d128 0000000000000801 ffffffff8121fe67
Aug  9 07:46:15 smaug kernel: [5132590.691981] Call Trace:
Aug  9 07:46:15 smaug kernel: [5132590.696597]  [<ffffffffa049cc54>] ? __ext4_journal_start_sb+0x34/0xf0 [ext4]
Aug  9 07:46:15 smaug kernel: [5132590.705791]  [<ffffffffa0470b9d>] ? ext4_dirty_inode+0x2d/0x60 [ext4]
Aug  9 07:46:15 smaug kernel: [5132590.714340]  [<ffffffff8121fe67>] ? __mark_inode_dirty+0x177/0x360
Aug  9 07:46:15 smaug kernel: [5132590.722623]  [<ffffffff8120e389>] ? generic_update_time+0x79/0xd0
Aug  9 07:46:15 smaug kernel: [5132590.730814]  [<ffffffff8120da8d>] ? file_update_time+0xbd/0x110
Aug  9 07:46:15 smaug kernel: [5132590.738845]  [<ffffffff81175f69>] ? __generic_file_write_iter+0x99/0x1e0
Aug  9 07:46:15 smaug kernel: [5132590.747708]  [<ffffffffa04631b6>] ? ext4_file_write_iter+0x196/0x3d0 [ext4]
Aug  9 07:46:15 smaug kernel: [5132590.756756]  [<ffffffff811f170b>] ? __vfs_write+0xeb/0x160
Aug  9 07:46:15 smaug kernel: [5132590.764301]  [<ffffffff811f2103>] ? __kernel_write+0x53/0x100
Aug  9 07:46:15 smaug kernel: [5132590.772081]  [<ffffffff810ff672>] ? do_acct_process+0x462/0x4e0
Aug  9 07:46:15 smaug kernel: [5132590.780035]  [<ffffffff810ffd4c>] ? acct_process+0xdc/0x100
Aug  9 07:46:15 smaug kernel: [5132590.787648]  [<ffffffff8107e403>] ? do_exit+0x7f3/0xb80
Aug  9 07:46:15 smaug kernel: [5132590.794894]  [<ffffffff8102fa5c>] ? oops_end+0x9c/0xd0
Aug  9 07:46:15 smaug kernel: [5132590.802027]  [<ffffffff81062d35>] ? no_context+0x135/0x390
Aug  9 07:46:15 smaug kernel: [5132590.809496]  [<ffffffff815ca1f8>] ? page_fault+0x28/0x30
Aug  9 07:46:15 smaug kernel: [5132590.816808]  [<ffffffffa0381af0>] ? btrfs_commit_transaction+0x350/0xa30 [btrfs]
Aug  9 07:46:15 smaug kernel: [5132590.826213]  [<ffffffff810ba590>] ? wait_woken+0x90/0x90
Aug  9 07:46:15 smaug kernel: [5132590.833501]  [<ffffffffa039a11b>] ? btrfs_sync_file+0x2fb/0x3e0 [btrfs]
Aug  9 07:46:15 smaug kernel: [5132590.842074]  [<ffffffff81225318>] ? do_fsync+0x38/0x60
Aug  9 07:46:15 smaug kernel: [5132590.849114]  [<ffffffff8122558c>] ? SyS_fsync+0xc/0x10
Aug  9 07:46:15 smaug kernel: [5132590.856096]  [<ffffffff815c81f6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
Aug  9 07:46:15 smaug kernel: [5132590.864522] Code: 56 41 55 41 54 55 53 48 89 fd 65 48 8b 04 25 c0 d4 00 00 48 83 ec 10 48 85 ff 48 8b 80 90 06 00 00 74 20 48 85 c0 74 33 48 8b 10 <48> 3b 3a 75 29 83 40 14 01 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e
Aug  9 07:46:15 smaug kernel: [5132590.888065] RIP  [<ffffffffa0444be3>] jbd2__journal_start+0x33/0x1e0 [jbd2]
Aug  9 07:46:15 smaug kernel: [5132590.896830]  RSP <ffff88000e0df8f0>
Aug  9 07:46:15 smaug kernel: [5132590.902039] CR2: 00000000000fd0e4
Aug  9 07:46:15 smaug kernel: [5132590.907032] ---[ end trace 3b9450d000ed06b4 ]---
Aug  9 07:46:15 smaug kernel: [5132590.914612] Fixing recursive fault but reboot is needed!

Thank you very much in advance for any ideas/feedback.  

Please CC me the responses
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

  reply	other threads:[~2016-08-09 22:42 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-10 23:41 recent complete stalls of btrfs (4.6.0-rc4+) -- any advice? Yaroslav Halchenko
2016-06-11  0:17 ` Chris Murphy
2016-06-13  3:46   ` Yaroslav Halchenko
2016-08-09 22:19     ` Yaroslav Halchenko [this message]
2016-09-09 12:13       ` recent complete stalls of btrfs (4.7.0-rc2+) " Yaroslav Halchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160809221951.GA26923@onerussian.com \
    --to=yoh@onerussian.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.