From: Dave Chinner <david@fromorbit.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-xfs@vger.kernel.org
Subject: Re: xfs: log recovery hang fixes
Date: Tue, 8 Mar 2022 08:18:37 +1100 [thread overview]
Message-ID: <20220307211837.GP59715@dread.disaster.area> (raw)
In-Reply-To: <YiZENvZ1CncSyoYX@casper.infradead.org>
On Mon, Mar 07, 2022 at 05:43:18PM +0000, Matthew Wilcox wrote:
> On Mon, Mar 07, 2022 at 04:32:49PM +1100, Dave Chinner wrote:
> > Willy reported generic/530 had started hanging on his test machines
> > and I've tried to reproduce the problem he reported. While I haven't
> > reproduced the exact hang he's been having, I've found a couple of
> > others while running g/530 in a tight loop on a couple of test
> > machines.
> [...]
> >
> > Willy, can you see if these patches fix the problem you are seeing?
> > If not, I still think they stand alone as necessary fixes, but I'll
> > have to keep digging to find out why you are seeing hangs in g/530.
>
> I no longer see hangs, but I do see an interesting pattern in runtime
> of g/530. I was seeing hangs after only a few minutes of running g/530,
> and I was using 15 minutes of success to say "git bisect good". Now at 45
> minutes of runtime with no hangs. Specifically, I'm testing 0020a190cf3e
> ("xfs: AIL needs asynchronous CIL forcing"), plus these three patches.
> If you're interested, I can see which of these three patches actually
> fixes my hang. I should also test these three patches on top of current
> 5.17-rc, but I wanted to check they were backportable to current stable
> first.
>
> Of the 120 times g/530 has run, I see 30 occurrences of the test taking
> 32-35 seconds. I see one occurrence of the test taking 63 seconds.
> Usually it takes 2-3s. This smacks to me of a 30s timeout expiring.
> Let me know if you want me to try to track down which one it is.
That'll be the log worker triggering a log force after 30s, and that
gets it unstuck. So you're still seeing the problem, only now the
watchdog kicks everything back into life.
Can you run a trace for me that captures one of those 30-60s runs
so I can see what might be happening? Something like:
# trace-cmd record -e xlog\* -e xfs_ail\* -e xfs_log\* -e xfs_inodegc\* -e printk ./check generic/530
I don't need all the XFS tracepoints - I'm mainly interested in log
and AIL interactions and what is stuck on them and when...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2022-03-07 21:18 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-07 5:32 xfs: log recovery hang fixes Dave Chinner
2022-03-07 5:32 ` [PATCH 1/3] xfs: log worker needs to start before intent/unlink recovery Dave Chinner
2022-03-07 5:32 ` [PATCH 2/3] xfs: check buffer pin state after locking in delwri_submit Dave Chinner
2022-03-07 5:32 ` [PATCH 3/3] xfs: xfs_ail_push_all_sync() stalls when racing with updates Dave Chinner
2022-03-07 17:43 ` xfs: log recovery hang fixes Matthew Wilcox
2022-03-07 21:18 ` Dave Chinner [this message]
2022-03-07 23:18 ` [PATCH 4/3] xfs: async CIL flushes need pending pushes to be made stable Dave Chinner
2022-03-08 6:12 ` [PATCH 4/3 v2] " Dave Chinner
2022-03-08 13:52 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220307211837.GP59715@dread.disaster.area \
--to=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox