From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 3/4] xfs: drain the buffer LRU on mount
Date: Tue, 24 Oct 2017 15:47:54 -0400 [thread overview]
Message-ID: <20171024194753.GE56184@bfoster.bfoster> (raw)
In-Reply-To: <20171024140606.GD56184@bfoster.bfoster>
On Tue, Oct 24, 2017 at 10:06:06AM -0400, Brian Foster wrote:
> On Mon, Oct 23, 2017 at 05:23:34PM -0700, Darrick J. Wong wrote:
> > On Mon, Oct 23, 2017 at 12:54:17PM -0400, Brian Foster wrote:
> > > On Mon, Oct 23, 2017 at 09:39:40AM -0700, Darrick J. Wong wrote:
> > > > On Mon, Oct 23, 2017 at 10:46:45AM -0400, Brian Foster wrote:
> > > > > Log recovery of v4 filesystems does not use buffer verifiers because
> > > > > log recovery historically can result in transient buffer corruption
> > > > > when target buffers might be ahead of the log after a crash. v5
> > > > > filesystems work around this problem with metadata LSN ordering.
> > > > >
> > > > > While the log recovery behavior is necessary on v4 supers, it
> > > > > currently can result in leaving buffers around in the LRU without
> > > > > verifiers attached for a significant amount of time. This can lead
> > > > > to use of unverified buffers while the filesystem is in active use,
> > > > > long after recovery has completed.
> > > > >
> > > > > To address this problem and provide a more consistent clean,
> > > > > post-mount buffer cache state, update the log mount sequence to
> > > > > unconditionally drain all buffers from the LRU as a final step.
> > > > >
> > > > > Reported-by: Darrick Wong <darrick.wong@oracle.com>
> > > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > > ---
> > > > > fs/xfs/xfs_log.c | 13 +++++++++++++
> > > > > 1 file changed, 13 insertions(+)
> > > > >
> > > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > > > index dc95a49..e282fd8 100644
> > > > > --- a/fs/xfs/xfs_log.c
> > > > > +++ b/fs/xfs/xfs_log.c
> > > > > @@ -744,6 +744,7 @@ xfs_log_mount_finish(
> > > > > {
> > > > > int error = 0;
> > > > > bool readonly = (mp->m_flags & XFS_MOUNT_RDONLY);
> > > > > + bool recovered = mp->m_log->l_flags & XLOG_RECOVERY_NEEDED;
> > > > >
> > > > > if (mp->m_flags & XFS_MOUNT_NORECOVERY) {
> > > > > ASSERT(mp->m_flags & XFS_MOUNT_RDONLY);
> > > > > @@ -780,6 +781,18 @@ xfs_log_mount_finish(
> > > > > mp->m_super->s_flags &= ~MS_ACTIVE;
> > > > > evict_inodes(mp->m_super);
> > > > >
> > > > > + /*
> > > > > + * Drain the buffer LRU after log recovery. This is required for v4
> > > > > + * filesystems to avoid leaving around buffers with NULL verifier ops,
> > > > > + * but we do it unconditionally to make sure we're always in a clean
> > > > > + * cache state after mount.
> > > > > + */
> > > > > + if (recovered) {
> > > >
> > > > if (recovered && !error) { ?
> > > >
> > > > I observed that running xfs/376 on an rmap filesystem fails when it
> > > > tries to fuzz the high bit of u3.bmx[0].startoff. That triggers an
> > > > incorrect freeing of what is now a post-eof extent. The corresponding
> > > > rmap free operation fails after the RUI has been logged and shuts down
> > > > the filesystem, so a subsequent log recovery attempt also fails when it
> > > > tries to remove an rmap that doesn't exist. If we then try to force the
> > > > log we end up deadlocked somehwere... though if we /don't/ then memory
> > > > gets corrupted and the kernel blows up anyway. :(
> > > >
> > >
> > > Interesting... do you have a stack trace? I'm curious why forcing the
> > > log would hang here and not in the subsequent log force in
> > > xfs_log_unmount() -> xfs_log_quiesce().
> >
> ...
> > ...after which I see this once before the system totally hangs:
> >
> > [ 5405.641103] INFO: task mount:28033 blocked for more than 60 seconds.
> > [ 5405.643982] Tainted: G W 4.14.0-rc6-xfsx #1
> > [ 5405.646067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [ 5405.650068] mount D12768 28033 27973 0x80000000
> > [ 5405.655505] Call Trace:
> > [ 5405.657210] ? __schedule+0x40d/0xb20
> > [ 5405.658865] schedule+0x40/0x90
> > [ 5405.661340] xfs_ail_push_all_sync+0xa4/0xe0 [xfs]
> > [ 5405.664348] ? remove_wait_queue+0x60/0x60
> > [ 5405.668815] xfs_log_mount_finish+0xc7/0x110 [xfs]
> > [ 5405.671736] xfs_mountfs+0x68f/0xaf0 [xfs]
> > [ 5405.674645] xfs_fs_fill_super+0x4d7/0x630 [xfs]
> > [ 5405.677673] ? xfs_finish_flags+0x150/0x150 [xfs]
> > [ 5405.679966] mount_bdev+0x178/0x1b0
> > [ 5405.681300] mount_fs+0xf/0x80
> > [ 5405.682426] vfs_kern_mount+0x62/0x160
> > [ 5405.683712] do_mount+0x1b1/0xd50
> > [ 5405.687634] ? rcu_read_lock_sched_held+0x3f/0x70
> > [ 5405.688453] SyS_mount+0x85/0xd0
> > [ 5405.689033] entry_SYSCALL_64_fastpath+0x1f/0xbe
> > [ 5405.689955] RIP: 0033:0x7f0226c72b5a
> > [ 5405.690533] RSP: 002b:00007fff9ddea318 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
> > [ 5405.692074] RAX: ffffffffffffffda RBX: 00007f0226f6b63a RCX: 00007f0226c72b5a
> > [ 5405.695604] RDX: 0000000001a3b2d0 RSI: 0000000001a3b310 RDI: 0000000001a3b2f0
> > [ 5405.696566] RBP: 0000000001a3b120 R08: 0000000001a3b270 R09: 0000000000000012
> > [ 5405.699583] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00007f022717b83c
> > [ 5405.700465] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000005
> > [ 5405.702569]
> > [ 5405.702569] Showing all locks held in the system:
> > [ 5405.703485] 1 lock held by khungtaskd/37:
> > [ 5405.704095] #0: (tasklist_lock){.+.+}, at: [<ffffffff810dfbf7>] debug_show_all_locks+0x37/0x190
> > [ 5405.705342] 2 locks held by agetty/810:
> > [ 5405.705845] #0: (&tty->ldisc_sem){++++}, at: [<ffffffff813fecf4>] tty_ldisc_ref_wait+0x24/0x50
> > [ 5405.706934] #1: (&ldata->atomic_read_lock){+.+.}, at: [<ffffffff813fd034>] n_tty_read+0xc4/0x880
> > [ 5405.708030] 1 lock held by in:imklog/864:
> > [ 5405.708534] #0: (&f->f_pos_lock){+.+.}, at: [<ffffffff8125d8d8>] __fdget_pos+0x48/0x60
> > [ 5405.709572] 1 lock held by mount/28033:
> > [ 5405.710112] #0: (&type->s_umount_key#45/1){+.+.}, at: [<ffffffff8123c993>] sget_userns+0x303/0x4f0
> > [ 5405.711210]
> > [ 5405.711408] =============================================
> > [ 5405.711408]
> >
>
> Thanks. Hmm, when I run xfs/376 on the djwong-devel xfs I see a bunch of
> mm/usercopy.c BUG()s and what looks like a mount hang during quotacheck,
> but it's blocked down in the block layer in io_schedule() attempting a
> readahead. I'm not terribly surprised by the lockup after all the BUG()
> invocations, but what I'm seeing doesn't appear to be similar to what
> you reproduce or related to the AIL in any way.
>
> I assume this mount behavior repeats after the test completes. If so,
> could you grab a metadump of the fs?
>
Ok, I can reproduce the problem now that I have the metadump. The AIL
push hangs because there's a pinned EFI sitting on the AIL. The reason
the subsequent push succeeds is because we cancel the recovery (which
cancels the intents) before we get to that point. Given that, I agree
that it makes sense to check for an error here. Thanks!
Brian
> Brian
>
> > --D
> >
> > >
> > > Brian
> > >
> > > > --D
> > > >
> > > > > + xfs_log_force(mp, XFS_LOG_SYNC);
> > > > > + xfs_ail_push_all_sync(mp->m_ail);
> > > > > + }
> > > > > + xfs_wait_buftarg(mp->m_ddev_targp);
> > > > > +
> > > > > if (readonly)
> > > > > mp->m_flags |= XFS_MOUNT_RDONLY;
> > > > >
> > > > > --
> > > > > 2.9.5
> > > > >
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@vger.kernel.org
> > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-10-24 19:47 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-23 14:46 [PATCH 0/4] xfs: miscellaneous log recovery fixes Brian Foster
2017-10-23 14:46 ` [PATCH 1/4] xfs: sanity check log record range parameters Brian Foster
2017-10-23 23:49 ` Darrick J. Wong
2017-10-24 11:30 ` Brian Foster
2017-10-25 5:09 ` Darrick J. Wong
2017-10-23 14:46 ` [PATCH 2/4] xfs: fix log block underflow during recovery cycle verification Brian Foster
2017-10-23 23:50 ` Darrick J. Wong
2017-10-23 14:46 ` [PATCH 3/4] xfs: drain the buffer LRU on mount Brian Foster
2017-10-23 16:39 ` Darrick J. Wong
2017-10-23 16:54 ` Brian Foster
2017-10-24 0:23 ` Darrick J. Wong
2017-10-24 14:06 ` Brian Foster
2017-10-24 19:47 ` Brian Foster [this message]
2017-10-23 14:46 ` [PATCH RFC 4/4] xfs: enforce a maximum total iclog buffer size Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171024194753.GE56184@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).