Re: [BUG] xfs/305 hangs 4.10-rc4 kernel

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eryu Guan <eguan@redhat.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [BUG] xfs/305 hangs 4.10-rc4 kernel
Date: Thu, 26 Jan 2017 11:29:50 +0800	[thread overview]
Message-ID: <20170126032950.GM1859@eguan.usersys.redhat.com> (raw)
In-Reply-To: <20170125145215.GC28388@bfoster.bfoster>

[-- Attachment #1: Type: text/plain, Size: 3908 bytes --]

On Wed, Jan 25, 2017 at 09:52:16AM -0500, Brian Foster wrote:
> On Wed, Jan 25, 2017 at 02:39:43PM +0800, Eryu Guan wrote:
> > Hi all,
> > 
> > I hit another test hang triggered by running xfs/305 in a loop, it
> > usually reproduces within 50 iterations. It also involves disabling
> > quota, which looks similar to the generic/232 hang[1]. Perhaps they
> > share the same root cause?
> > 
> > [333853.198159] xfs_quota       D    0 13880  13441 0x00000080
> > [333853.227479] Call Trace:
> > [333853.239429]  __schedule+0x21c/0x6b0
> > [333853.255410]  schedule+0x36/0x80
> > [333853.269932]  schedule_timeout+0x1d1/0x3a0
> > [333853.288354]  ? lock_timer_base+0xa0/0xa0
> > [333853.306296]  ? xfs_qm_need_dqattach+0x80/0x80 [xfs]
> > [333853.329015]  schedule_timeout_uninterruptible+0x1f/0x30
> > [333853.353344]  xfs_qm_dquot_walk.isra.10+0x172/0x190 [xfs]
> > [333853.377747]  xfs_qm_dqpurge_all+0x5c/0x80 [xfs]
> > [333853.398433]  xfs_qm_scall_quotaoff+0x127/0x380 [xfs]
> > [333853.421062]  xfs_quota_disable+0x3d/0x50 [xfs]
> > [333853.441319]  SyS_quotactl+0x391/0x850
> > [333853.458143]  ? __audit_syscall_entry+0xaf/0x100
> > [333853.478903]  ? syscall_trace_enter+0x1d0/0x2b0
> > [333853.499262]  ? __audit_syscall_exit+0x209/0x290
> > [333853.519938]  do_syscall_64+0x67/0x180
> > [333853.536771]  entry_SYSCALL64_slow_path+0x25/0x25
> > [333853.558105] RIP: 0033:0x7f95be064fea
> > [333853.574521] RSP: 002b:00007ffc988b6348 EFLAGS: 00000206 ORIG_RAX: 00000000000000b3
> > [333853.608922] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f95be064fea
> > [333853.641478] RDX: 0000000000000000 RSI: 00000000027bb2f0 RDI: 0000000000580201
> > [333853.674091] RBP: 00000000027bb320 R08: 00007ffc988b635c R09: 0000000000000001
> > [333853.706913] R10: 00007ffc988b635c R11: 0000000000000206 R12: 00000000027bb2d0
> > [333853.744953] R13: 00000000027ba0a0 R14: 0000000000000000 R15: 00000000027ba080
> > 
> > I've seen this hang on multiple test hosts with different test configs,
> > e.g. xfs_2k_crc xfs_4k_reflink. I also tried with 4.9 kernel, but I kept
> > hitting the BUG_ON that this kernel commit[2] fixed in 4.10-rc kernel.
> > 
> 
> Interesting... I'm not sure this is necessarily locked up given the call
> to schedule_timeout(). That suggests we could be in the
> xfs_qm_dquot_walk()->delay() call and possibly spinning on skipped
> entries. It looks like the only way we skip an entry is if it is already
> being freed or has a reference count. The only other dquot free path
> looks like the shrinker, but that doesn't appear in the blocked task
> list at all. We do have a dqread call, but that path hasn't even added
> the dquot to the radix tree yet.
> 
> > I attached full sysrq-w output. If you need more info please let me
> > know.
> >
> 
> It looks like pretty much everything is hung up on log reservation,
> including writeback, which is more indicative of a potential log
> reservation problem. xfsaild appears to be alive, however (a
> schedule_timeout() in that path suggests the thread is working to push
> log items and free up log space), so maybe something else is going on.
> It might be interesting to see if we can tell if we're spinning
> somewhere by enabling tracepoints once in this state. Also, have you
> checked for any unexpected syslog messages when this occurs?

There's nothing interesting in dmesg with stock 4.10-rc4 kernel, but if
I test with Darrick's xfs-linux tree for-next branch (which has a few
patches on top of 4.10-rc4), I can see this message:

XFS (dm-5): xlog_verify_grant_tail: space > BBTOB(tail_blocks)

But I can see this message in every xfs/305 iteration, and other tests
too. I attached dmesg log prior to sysrq-w output, gathered from
"for-next" kernel.

I also attached trace log while we're in this state (also from
"for-next" kernel):

cd /mnt/ext4
trace-cmd record -e xfs_*
sleep 10
trace-cmd report > log

Thanks,
Eryu

[-- Attachment #2: upstream-4.10-rc4-xfs-305-hang-sysrq-w.log.gz --]
[-- Type: application/gzip, Size: 29529 bytes --]

[-- Attachment #3: upstream-4.10-rc4-xfs-305-hang-trace.log.gz --]
[-- Type: application/gzip, Size: 28432 bytes --]

next prev parent reply	other threads:[~2017-01-26  3:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-25  6:39 [BUG] xfs/305 hangs 4.10-rc4 kernel Eryu Guan
2017-01-25 14:52 ` Brian Foster
2017-01-26  3:29   ` Eryu Guan [this message]
2017-01-26 18:44     ` Brian Foster
2017-01-27  2:52       ` Eryu Guan
2017-01-27 17:15         ` Brian Foster
2017-01-30 18:12         ` Brian Foster
2017-01-30 21:59           ` Brian Foster
2017-02-04 11:47             ` Eryu Guan
2017-02-06 17:59               ` Brian Foster
2017-02-13 16:43                 ` Brian Foster
2017-07-26  5:09                   ` Eryu Guan
2017-07-26 10:35                     ` Brian Foster
2017-07-26 10:43                       ` Eryu Guan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170126032950.GM1859@eguan.usersys.redhat.com \
    --to=eguan@redhat.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).