From: Dave Chinner <david@fromorbit.com>
To: Chandra Seetharaman <sekharan@us.ibm.com>
Cc: XFS mailing list <xfs@oss.sgi.com>
Subject: Re: deadlock with &log->l_cilp->xc_ctx_lock semaphone
Date: Fri, 21 Jun 2013 07:06:09 +1000 [thread overview]
Message-ID: <20130620210609.GR29376@dastard> (raw)
In-Reply-To: <1371597515.22504.51.camel@chandra-dt.ibm.com>
On Tue, Jun 18, 2013 at 06:18:35PM -0500, Chandra Seetharaman wrote:
> On Thu, 2013-05-23 at 09:41 +1000, Dave Chinner wrote:
> > On Wed, May 22, 2013 at 06:12:43PM -0500, Chandra Seetharaman wrote:
> > > Hello,
> > >
> > > While testing and rearranging my pquota/gquota code, I stumbled on a
> > > xfs_shutdown() during a mount. But the mount just hung.
> > >
> > > I debugged and found that it is in a code path where
> > > &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels
> > > down the same semaphore is being acquired in write mode causing a
> > > deadlock.
> > >
> > > This is the stack:
> > > xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode
> > > xlog_print_tic_res
> > > xfs_force_shutdown
> > > xfs_log_force_umount
> > > xlog_cil_force
> > > xlog_cil_force_lsn
> > > xlog_cil_push_foreground
> > > xlog_cil_push - tries to acquire same semaphore in write mode
> >
> > Which means you had a transaction reservation overrun. Is it
> > reproducable? iDo you have the output from xlog_print_tic_res()?
> > Because:
> >
> > > xfs_trans_commit+0x79/0x270 [xfs]
> > > xfs_qm_write_sb_changes+0x61/0x90 [xfs]
> > > xfs_qm_mount_quotas+0x82/0x180 [xfs]
> > > xfs_mountfs+0x5f6/0x6b0 [xfs]
> >
> > This transaction only modifies the superblock, and it has a buffer
> > reservation for a superblock sized buffer, and hence should never
> > overrun.
> >
> > IOWs, I'm ifar more concerned about the fact there was a
> > transaction overrun than they was a hang in the path that handles
> > the overrun. The fact this hang has been there since 2.6.35 tells
> > you how rare transactions overruns are....
> >
> > FWIW, the fix for the hang is to make xlog_print_tic_res() return an
> > error and have the caller handle the shutdown.
>
> Dave,
>
> There are few ways this can be done, but each of them seem to change the
> code behavior. Wanted to get your opinion on which is the correct way.
>
> (1) - don't shutdown in xlog_print_tic_res();
> - upon return from xlog_print_tic_res(), do a up_read, and shutdown
> - at the end of fucntion, up_read only if we haven't done it already
> Behavior change: The protection offered by the semaphore is not
> available to the code block from shutdown to end of function
We don't want to do the rest of the work in the function if the
ticket check fails, so:
> (2) - don't shoudown in xlog_print_tic_res()
> - upon return just set a flag
> - at the end of function, after up_read, do shutdown
> Behavior change: Shutdown is delayed to a later point.
A behaviour change is fine because we don't care about races here.
> I prefer (2) since (1) drops the protection. But, do not know the
> ramifications of delaying the shutdown. Can you comment ?
Basically, I think you can have xlog_print_tic_res() return
EFSCORRUPTED to the xfs_log_commit_cil(), then have it drop the
lock and return that error to xfs_trans_commit() immediately.
However, xfs_trans_commit() only catches the ENOMEM error, but that
was a temporary situation to handle the allocation error from
xfs_log_commit_cil() specially. Since we've removed the old old
logging code, this is the only error that can be returned so the
shutdown on error doesn't need to be specific to ENOMEM. hence if
you convert this to just a plain if (error) check, the shutdown will
happen from xfs_trans_commit() and the deadlock won't occur.
FWIW, to be on the safe side, I'd move the ticket check to after we
attach the busy extents to the CIL so that we can be certain that
the error path in xfs_trans_commit() does the right thing with them.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
prev parent reply other threads:[~2013-06-20 21:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-22 23:12 deadlock with &log->l_cilp->xc_ctx_lock semaphone Chandra Seetharaman
2013-05-22 23:41 ` Dave Chinner
2013-05-23 18:09 ` Chandra Seetharaman
2013-05-23 23:42 ` Dave Chinner
2013-05-24 21:41 ` Chandra Seetharaman
2013-05-24 21:44 ` Chandra Seetharaman
2013-06-18 23:18 ` Chandra Seetharaman
2013-06-20 21:06 ` Dave Chinner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130620210609.GR29376@dastard \
--to=david@fromorbit.com \
--cc=sekharan@us.ibm.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox