From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org, Eryu Guan <eguan@redhat.com>,
Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 2/5] xfs: allocate quotaoff transactions up front to avoid log deadlock
Date: Thu, 27 Apr 2017 08:47:43 -0700 [thread overview]
Message-ID: <20170427154743.GE23371@birch.djwong.org> (raw)
In-Reply-To: <20170427120324.GA46981@bfoster.bfoster>
On Thu, Apr 27, 2017 at 08:03:27AM -0400, Brian Foster wrote:
> On Wed, Apr 26, 2017 at 02:23:33PM -0700, Darrick J. Wong wrote:
> > On Wed, Feb 15, 2017 at 10:40:44AM -0500, Brian Foster wrote:
> > > The quotaoff operation commits two explicitly synchronous
> > > transactions to correctly handle log recovery of dquots being
> > > modified at the time the quotaoff occurs. The first quotaoff
> > > transaction pins the tail of the log with a qoff logitem in the AIL
> > > to ensure further logged dquots stay ahead of the quotaoff operation
> > > in the log. The second quotaoff_end transaction is committed after
> > > the quotaoff operation completes, releases the original log item and
> > > thus unpins the log.
> > >
> > > Since the first transaction pins the tail of the log, this means a
> > > finite amount of space in the log is available between the time a
> > > quotaoff starts and completes before transaction reservations have
> > > to block on the log. While the quotaoff operation itself consumes no
> > > further log space, it is possible for other operations in the
> > > filesystem to consume the remaining log space before the quotaoff
> > > completes. If this occurs, the quotaoff_end transaction also blocks
> > > on the log which prevents the release of the log item and the
> > > filesystem deadlocks. This has been reproduced via repeated xfs/305
> > > iterations on a vm with fairly limited resources.
> > >
> > > To avoid a deadlock due to a particularly slow quotaoff operation,
> > > allocate the quotaoff_end transaction immediately after the initial
> > > quotaoff transaction is committed. Carry a reference to the
> > > transaction through xfs_qm_scall_quotaoff() rather than the qoff log
> > > item and commit it once the quotaoff completes.
> >
> > Hmm... so instead of holding a pinned log item for however long it takes
> > to finish quotaoff, we instead hold a transaction. That does fix the
> > problem where we pin the log tail and the head crashes into it, but also
> > comes with its own problems, doesn't it?
> >
> > In xfs_qm_scall_quotaoff, we call xfs_qm_dqrele_all_inodes, which
> > calls xfs_dqrele_inode to call xfs_qm_dqrele on all the inodes in the
> > system. However, if any of those inodes are unlinked and inactive, the
> > iput -> xfs_fs_destroy_inode -> xfs_inactive path truncates the inode,
> > which itself requires a transaction, so now we're running with nested
> > transactions.
> >
> > I'm not sure that's necessarily a problem because the outer transaction
> > contains only a single qoffend item. AFAICT I haven't encountered any
> > other places in XFS where a single thread runs nested transactions, but
> > it still seems suspect to me.
> >
>
> Yeah, good spot. I wasn't expecting to be logging anything else in this
> window of time, but it looks like that is possible. We definitely don't
> want to be doing nested transactions. I'll have to think about this one
> some more.
>
> Note that this was a one off fix for something I reproduced when trying
> to reproduce the livelock addressed by the subsequent patches in the
> series. It's not a dependency for the subsequent 3 patches.
Oh, ok. Thank you for pointing that out, as I didn't see a strong
connection between this patch and the next three, and was wondering if
there was one.
> > Heh, lockdep just spat this at me:
> >
>
> I'm guessing I just didn't have lockdep enabled when I tested this. To
> be sure... what is the reproducer for this? Thanks.
Running xfs/305 in a loop.
--D
>
> Brian
>
> > [19767.048839] =============================================
> > [19767.049699] [ INFO: possible recursive locking detected ]
> > [19767.050511] 4.11.0-rc4-djw #3 Not tainted
> > [19767.051122] ---------------------------------------------
> > [19767.051907] xfs_quota/4152 is trying to acquire lock:
> > [19767.052617] (sb_internal){.+.+.+}, at: [<ffffffffa015ee13>] xfs_trans_alloc+0xe3/0x130 [xfs]
> > [19767.054151]
> > [19767.054151] but task is already holding lock:
> > [19767.055246] (sb_internal){.+.+.+}, at: [<ffffffffa015ee13>] xfs_trans_alloc+0xe3/0x130 [xfs]
> > [19767.056897]
> > [19767.056897] other info that might help us debug this:
> > [19767.057980] Possible unsafe locking scenario:
> > [19767.057980]
> > [19767.058967] CPU0
> > [19767.059385] ----
> > [19767.060339] lock(sb_internal);
> > [19767.061187] lock(sb_internal);
> > [19767.061771]
> > [19767.061771] *** DEADLOCK ***
> > [19767.061771]
> > [19767.062818] May be due to missing lock nesting notation
> > [19767.062818]
> > [19767.063967] 3 locks held by xfs_quota/4152:
> > [19767.064521] #0: (&type->s_umount_key#34){++++++}, at: [<ffffffff812287f0>] __get_super.part.18+0xc0/0xe0
> > [19767.065805] #1: (&qinf->qi_quotaofflock){+.+.+.}, at: [<ffffffffa0182a53>] xfs_qm_scall_quotaoff+0x53/0x6c0 [xfs]
> > [19767.067226] #2: (sb_internal){.+.+.+}, at: [<ffffffffa015ee13>] xfs_trans_alloc+0xe3/0x130 [xfs]
> > [19767.068543]
> > [19767.068543] stack backtrace:
> > [19767.069191] CPU: 2 PID: 4152 Comm: xfs_quota Not tainted 4.11.0-rc4-djw #3
> > [19767.070124] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > [19767.071405] Call Trace:
> > [19767.071798] dump_stack+0x85/0xc2
> > [19767.072298] __lock_acquire+0x164e/0x16a0
> > [19767.072866] ? kvm_sched_clock_read+0x9/0x20
> > [19767.073460] ? sched_clock+0x9/0x10
> > [19767.073979] lock_acquire+0xbf/0x210
> > [19767.074532] ? xfs_trans_alloc+0xe3/0x130 [xfs]
> > [19767.075166] __sb_start_write+0xc0/0x220
> > [19767.075782] ? xfs_trans_alloc+0xe3/0x130 [xfs]
> > [19767.076475] xfs_trans_alloc+0xe3/0x130 [xfs]
> > [19767.077138] xfs_inactive_truncate+0x31/0x130 [xfs]
> > [19767.077862] ? xfs_qm_dqattach+0x16/0x50 [xfs]
> > [19767.078524] xfs_inactive+0x17d/0x270 [xfs]
> > [19767.079141] xfs_fs_destroy_inode+0xb3/0x350 [xfs]
> > [19767.079964] destroy_inode+0x3b/0x60
> > [19767.080679] evict+0x132/0x190
> > [19767.081301] iput+0x243/0x320
> > [19767.081961] xfs_inode_ag_walk+0x2a8/0x690 [xfs]
> > [19767.082757] ? xfs_inode_ag_walk+0x92/0x690 [xfs]
> > [19767.083545] ? xfs_trans_free_dqinfo+0x30/0x30 [xfs]
> > [19767.084279] ? xfs_perag_get+0xa8/0x2b0 [xfs]
> > [19767.084753] ? xfs_perag_get+0xa8/0x2b0 [xfs]
> > [19767.085234] xfs_inode_ag_iterator_flags+0x69/0xa0 [xfs]
> > [19767.086251] ? xfs_trans_free_dqinfo+0x30/0x30 [xfs]
> > [19767.087219] xfs_qm_dqrele_all_inodes+0x36/0x60 [xfs]
> > [19767.088431] xfs_qm_scall_quotaoff+0x100/0x6c0 [xfs]
> > [19767.089285] xfs_quota_disable+0x3d/0x50 [xfs]
> > [19767.090138] SyS_quotactl+0x3ff/0x820
> > [19767.090853] ? SYSC_newstat+0x3b/0x50
> > [19767.091501] ? trace_hardirqs_on_caller+0x129/0x1b0
> > [19767.092398] ? trace_hardirqs_on_thunk+0x1a/0x1c
> > [19767.093238] entry_SYSCALL_64_fastpath+0x1f/0xc2
> > [19767.094150] RIP: 0033:0x7f43530a70ca
> > [19767.094851] RSP: 002b:00007ffe7570fbe8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b3
> > [19767.096078] RAX: ffffffffffffffda RBX: 00007ffe7570fde8 RCX: 00007f43530a70ca
> > [19767.097242] RDX: 0000000000000000 RSI: 00000000012d26f0 RDI: 0000000000580202
> > [19767.098402] RBP: 0000000000000005 R08: 00007ffe7570fbfc R09: 0000000000005802
> > [19767.099645] R10: 00007ffe7570fbfc R11: 0000000000000206 R12: 0000000000401e50
> > [19767.100841] R13: 00007ffe7570fde0 R14: 0000000000000000 R15: 0000000000000000
> >
> > --D
> >
> > >
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > > fs/xfs/xfs_qm_syscalls.c | 42 ++++++++++++++++++++++++------------------
> > > 1 file changed, 24 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c
> > > index 475a388..dbb6802 100644
> > > --- a/fs/xfs/xfs_qm_syscalls.c
> > > +++ b/fs/xfs/xfs_qm_syscalls.c
> > > @@ -35,9 +35,11 @@
> > > #include "xfs_trace.h"
> > > #include "xfs_icache.h"
> > >
> > > -STATIC int xfs_qm_log_quotaoff(xfs_mount_t *, xfs_qoff_logitem_t **, uint);
> > > -STATIC int xfs_qm_log_quotaoff_end(xfs_mount_t *, xfs_qoff_logitem_t *,
> > > - uint);
> > > +STATIC int xfs_qm_log_quotaoff(struct xfs_mount *, struct xfs_trans **,
> > > + uint);
> > > +STATIC int xfs_qm_log_quotaoff_end(struct xfs_mount *,
> > > + struct xfs_qoff_logitem *,
> > > + struct xfs_trans **, uint);
> > >
> > > /*
> > > * Turn off quota accounting and/or enforcement for all udquots and/or
> > > @@ -56,7 +58,7 @@ xfs_qm_scall_quotaoff(
> > > uint dqtype;
> > > int error;
> > > uint inactivate_flags;
> > > - xfs_qoff_logitem_t *qoffstart;
> > > + struct xfs_trans *qend_tp;
> > >
> > > /*
> > > * No file system can have quotas enabled on disk but not in core.
> > > @@ -128,7 +130,7 @@ xfs_qm_scall_quotaoff(
> > > * and synchronously. If we fail to write, we should abort the
> > > * operation as it cannot be recovered safely if we crash.
> > > */
> > > - error = xfs_qm_log_quotaoff(mp, &qoffstart, flags);
> > > + error = xfs_qm_log_quotaoff(mp, &qend_tp, flags);
> > > if (error)
> > > goto out_unlock;
> > >
> > > @@ -181,7 +183,7 @@ xfs_qm_scall_quotaoff(
> > > * So, we have QUOTAOFF start and end logitems; the start
> > > * logitem won't get overwritten until the end logitem appears...
> > > */
> > > - error = xfs_qm_log_quotaoff_end(mp, qoffstart, flags);
> > > + error = xfs_trans_commit(qend_tp);
> > > if (error) {
> > > /* We're screwed now. Shutdown is the only option. */
> > > xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > > @@ -556,13 +558,14 @@ xfs_qm_scall_setqlim(
> > >
> > > STATIC int
> > > xfs_qm_log_quotaoff_end(
> > > - xfs_mount_t *mp,
> > > - xfs_qoff_logitem_t *startqoff,
> > > + struct xfs_mount *mp,
> > > + struct xfs_qoff_logitem *startqoff,
> > > + struct xfs_trans **tpp,
> > > uint flags)
> > > {
> > > - xfs_trans_t *tp;
> > > + struct xfs_trans *tp;
> > > int error;
> > > - xfs_qoff_logitem_t *qoffi;
> > > + struct xfs_qoff_logitem *qoffi;
> > >
> > > error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_equotaoff, 0, 0, 0, &tp);
> > > if (error)
> > > @@ -578,21 +581,22 @@ xfs_qm_log_quotaoff_end(
> > > * We don't care about quotoff's performance.
> > > */
> > > xfs_trans_set_sync(tp);
> > > - return xfs_trans_commit(tp);
> > > + *tpp = tp;
> > > + return 0;
> > > }
> > >
> > >
> > > STATIC int
> > > xfs_qm_log_quotaoff(
> > > - xfs_mount_t *mp,
> > > - xfs_qoff_logitem_t **qoffstartp,
> > > - uint flags)
> > > + struct xfs_mount *mp,
> > > + struct xfs_trans **end_tp,
> > > + uint flags)
> > > {
> > > - xfs_trans_t *tp;
> > > + struct xfs_trans *tp;
> > > int error;
> > > - xfs_qoff_logitem_t *qoffi;
> > > + struct xfs_qoff_logitem *qoffi;
> > >
> > > - *qoffstartp = NULL;
> > > + *end_tp = NULL;
> > >
> > > error = xfs_trans_alloc(mp, &M_RES(mp)->tr_qm_quotaoff, 0, 0, 0, &tp);
> > > if (error)
> > > @@ -617,7 +621,9 @@ xfs_qm_log_quotaoff(
> > > if (error)
> > > goto out;
> > >
> > > - *qoffstartp = qoffi;
> > > + error = xfs_qm_log_quotaoff_end(mp, qoffi, end_tp, flags);
> > > + if (error)
> > > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > > out:
> > > return error;
> > > }
> > > --
> > > 2.7.4
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-04-27 15:47 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-15 15:40 [PATCH 0/5] xfs: quota deadlock fixes Brian Foster
2017-02-15 15:40 ` [PATCH 1/5] xfs: bypass dquot reclaim to avoid quotacheck deadlock Brian Foster
2017-02-16 22:37 ` Dave Chinner
2017-02-17 18:30 ` Brian Foster
2017-02-17 23:12 ` Dave Chinner
2017-02-18 12:55 ` Brian Foster
2017-02-15 15:40 ` [PATCH 2/5] xfs: allocate quotaoff transactions up front to avoid log deadlock Brian Foster
2017-04-26 21:23 ` Darrick J. Wong
2017-04-27 12:03 ` Brian Foster
2017-04-27 15:47 ` Darrick J. Wong [this message]
2017-02-15 15:40 ` [PATCH 3/5] xfs: support ability to wait on new inodes Brian Foster
2017-04-27 21:15 ` Darrick J. Wong
2017-02-15 15:40 ` [PATCH 4/5] xfs: update ag iterator to support " Brian Foster
2017-04-27 21:17 ` Darrick J. Wong
2017-02-15 15:40 ` [PATCH 5/5] xfs: wait on new inodes during quotaoff dquot release Brian Foster
2017-04-27 21:17 ` Darrick J. Wong
2017-02-16 7:42 ` [PATCH 0/5] xfs: quota deadlock fixes Eryu Guan
2017-02-16 12:01 ` Brian Foster
2017-02-17 6:53 ` Eryu Guan
2017-02-17 17:54 ` Brian Foster
2017-02-20 3:52 ` Eryu Guan
2017-02-20 13:25 ` Brian Foster
2017-02-22 15:35 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170427154743.GE23371@birch.djwong.org \
--to=darrick.wong@oracle.com \
--cc=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=eguan@redhat.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).