From: Dave Chinner <david@fromorbit.com>
To: Mark Tinguely <tinguely@sgi.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 2/3] xfs: fix direct IO nested transaction deadlock.
Date: Wed, 28 Nov 2012 11:14:11 +1100 [thread overview]
Message-ID: <20121128001411.GP6434@dastard> (raw)
In-Reply-To: <50B542B3.4030402@sgi.com>
On Tue, Nov 27, 2012 at 04:46:11PM -0600, Mark Tinguely wrote:
> On 11/26/12 15:45, Dave Chinner wrote:
> >>but with a bit more filesystem traffic (I
> >>used your filesystem, fsstress and added the rm/copies from bug 922
> >>because I knew it stressed the log), we can easily (couple hours) hang
> >
> >Script, please.
>
> I started with your filesystem, fsstress and added the programs from
> Bug 922:
>
> http://oss.sgi.com/bugzilla/attachment.cgi?id=304
> http://oss.sgi.com/bugzilla/attachment.cgi?id=305
>
> #!/bin/sh
> perl ./copy.pl &
> sleep 180
> while true
> do
> # ./check -g auto
> /root/xfstests/ltp/fsstress -d /test2 -n 1000 -p 1000
> done
Ok, I'll try it.
> >>the current hangs happen because
> >>the sync worker is responsible to force the writing of the CIL but the
> >>sync worker needs log space for the dummy record.
> >
> >The sync worker is responsible for idling the log, not much else.
> >
>
> My mistake. The sync worker will only push the AIL. I am thinking of
> the xfsaild_push() will force the CIL if there is a pinned item -
> not appropriate if the AIL is empty. It is xfs_fs_sync_fs() can
> force the CIL as can other commands.
>
> Side question:
> What keeps the CIL from consuming more than half the log space?
See the comments in xfs_log_priv.h above these two lines:
#define XLOG_CIL_SPACE_LIMIT(log) (log->l_logsize >> 3)
#define XLOG_CIL_HARD_SPACE_LIMIT(log) (3 * (log->l_logsize >> 4))
> Here are the data structures. This is strange, normally hangs with
> an empty AIL, the space could be accounted for in the remain grant
> space and CIL.
> ===============
> AIL at ffff88034c798640:
> struct xfs_ail {
....
> xa_target = 0xd000004000,
> xa_target_prev = 0xd000004000,
....
> xa_last_pushed_lsn = 0x0,
> xa_log_flush = 0x0,
....
> =====================
> xlog at ffff88034ec0a400
> struct xlog {
....
> l_logsize = 0xa00000,
Which means the log is 10485760 bytes in size
...
> l_curr_cycle = 0xd0,
> l_prev_cycle = 0xd0,
> l_curr_block = 0x4200,
> l_prev_block = 0x4000,
> l_last_sync_lsn = {
> counter = 0xd000004000
> },
> l_tail_lsn = {
> counter = 0xd000004000
> },
> l_reserve_head = {
> lock = {
> },
> waiters = {
> next = 0xffff8802ba98fe30,
> prev = 0xffff8802b7c312b0
> },
> grant = {
> counter = 0xd1006656f0
> }
> },
>
> 0x800000-0x6656F0 == 1681680 reserve space left
BTW, when I ask for the full structure, I'd like to see the full
structure so that as i walk through the code I can refer back to the
structure to see that the values are what I think they should be...
> =====================
> Top waiter ticket at ffff8802ba98fe30
> struct xlog_ticket {
....
> t_curr_res = 0xa8b6c,
> t_unit_res = 0xa8b6c,
> t_ocnt = 0x3,
> t_cnt = 0x3,
> t_clientid = 0x69,
> t_flags = 0x3,
>
> wants 0xa8b6c * 3 == 2073156 bytes.
And that is close to 20% of the log size but not an invalid size.
The question is "what is holding all the grant space"? That's why I
need the whole struct xlog output - I suspect that there are
permanent transactions waiting on the write head and so can't move
forward and hence can't release grant space.
> =====================
> CIL at ffff88034c159ec0
>
> struct xfs_cil {
.....
> xc_current_sequence = 0x1de0,
.....
> xc_push_seq = 0x1ddf
> }
Which indicates that there is an active sequence that hasn't been
pushed.
> =====================
> struct xfs_cil_ctx {
....
> space_used = 0x3d30, <- only 15664 bytes
So very little.
> The ticket held by the CIL CTX at ffff8802b80c0e30:
> struct xlog_ticket {
....
> t_unit_res = 0x82034, <- 532532 bytes
Which means it has already stolen 532532 bytes from the existing
committed transactions in the CIL. This is accounted as used space
in the log as it is stolen from committed transactions that reserved
the space in the first place.
However, this is such a small log, I'm wondering whether this
overhead is the source of the issue. It can't be freed up by pushing
on the AIL, and at 5% of the log space it increases the CIL
background push threshold by close to 50% (i.e. from 12.5% of the
log space used to 17.5%). And it's larger than the difference
between the reserve grant head and the ticket reservation size.
I think I need to reproduce this locally to be able to debug it...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-11-28 0:11 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-23 3:24 [PATCH 0/3, V2] xfs: regression fixes for the 3.8 cycle Dave Chinner
2012-11-23 3:24 ` [PATCH 1/3] xfs: inode allocation should use unmapped buffers Dave Chinner
2012-11-26 19:07 ` Mark Tinguely
2012-11-26 22:12 ` Ben Myers
2012-11-26 23:00 ` Dave Chinner
2012-11-23 3:24 ` [PATCH 2/3] xfs: fix direct IO nested transaction deadlock Dave Chinner
2012-11-23 8:31 ` Christoph Hellwig
2012-11-26 15:54 ` Mark Tinguely
2012-11-26 21:45 ` Dave Chinner
2012-11-27 22:46 ` Mark Tinguely
2012-11-28 0:14 ` Dave Chinner [this message]
2012-11-23 3:24 ` [PATCH 3/3] xfs: byte range granularity for XFS_IOC_ZERO_RANGE Dave Chinner
2012-11-23 8:34 ` Christoph Hellwig
2012-11-23 8:59 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121128001411.GP6434@dastard \
--to=david@fromorbit.com \
--cc=tinguely@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox