From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs@oss.sgi.com
Subject: Re: Still seeing hangs in xlog_grant_log_space
Date: Thu, 7 Jun 2012 11:35:31 +1000 [thread overview]
Message-ID: <20120607013531.GP22848@dastard> (raw)
In-Reply-To: <4FCF5DB9.2000808@redhat.com>
On Wed, Jun 06, 2012 at 09:40:09AM -0400, Brian Foster wrote:
> On 06/05/2012 07:54 PM, Dave Chinner wrote:
> > On Fri, May 25, 2012 at 01:03:04PM -0400, Peter Watkins wrote:
> >> On Fri, May 25, 2012 at 2:28 AM, Juerg Haefliger <juergh@gmail.com> wrote:
>
> snip
>
> > At this point, running on a 3.5-rc1 kernel is what we need to get
> > working reliably. Once we have the problems solved there, we can
> > work out what set of patches need to be backported to 3.0-stable and
> > other kernels to fix the problems in those supported kernels...
> >
>
> Hi guys,
>
> I've been reproducing a similar stall in my testing of the 're-enable
> xfsaild idle mode' patch/thread that only occurs for me in the xfs tree.
> I was able to do a bisect from rc2 down to commit 43ff2122, though the
> history of this issue makes me wonder if this commit just makes the
> problem more reproducible as opposed to introducing it. Anyways, the
> characteristics I observe so far:
More reproducable. See below.
> - Task blocked for more than 120s message in xlog_grant_head_wait(). I
> see xfs_sync_worker() in my current bt, but I'm pretty sure I've seen
> the same issue without it involved.
> - The AIL is not empty/idle. It spins with a relatively small and
> constant number of entries (I've seen ~8-40). These items are all always
> marked as "flushing."
> - Via crash, all the inodes in the ail appear to be marked as stale
> (i.e. li_cb == xfs_istale_done). The inode flags are
> XFS_ISTALE|XFS_IRECLAIMABLE|XFS_IFLOCK.
> - The iflock in particular is why the ail marks these items 'flushing'
> and why nothing seems to proceed any further (xfsaild just waits for
> these to complete). I can kick the fs back into action with a 'sync.'
Right, I've seen this as well. What I analysed in the case I saw was
that the underlying buffer is also stale - correctly - and it is
pinned in memory so cannot be flushed. HEnce all the inodes are
inteh same state. The reason they are pinned in memory is that they
items were still active in the CIL, and a log force was need to
checkpoint the CIL and cause the checkpoint to be committed. Once
the CIL checkpoint is committed, the stale items are freed from the
AIL, and everything goes onward. The problem is that with the
xfs_sync_worker stalled, nothing triggers a log force because the
inode is returning "flushing" to the AIL pushes.
However, your analysis has allowed me to find what I think is the
bug causing your problem - what I missed when I last saw this was
the significance of the order of checks in xfs_inode_item_push().
That is, we check for whether the inode is flush locked before we
check if it is stale.
By definition, a dirty stale inode must be attached to the
underlying stale buffer and that requires it to be flush locked, as
can be seen in xfs_ifree_cluster:
>>>>>> xfs_iflock(ip);
>>>>>> xfs_iflags_set(ip, XFS_ISTALE);
/*
* we don't need to attach clean inodes or those only
* with unlogged changes (which we throw away, anyway).
*/
iip = ip->i_itemp;
if (!iip || xfs_inode_clean(ip)) {
ASSERT(ip != free_ip);
xfs_ifunlock(ip);
xfs_iunlock(ip, XFS_ILOCK_EXCL);
continue;
}
iip->ili_last_fields = iip->ili_fields;
iip->ili_fields = 0;
iip->ili_logged = 1;
xfs_trans_ail_copy_lsn(mp->m_ail, &iip->ili_flush_lsn,
&iip->ili_item.li_lsn);
>>>>>> xfs_buf_attach_iodone(bp, xfs_istale_done,
>>>>>> &iip->ili_item);
So basically, the problem is that we should be checking for stale
before flushing in xfs_inode_item_push(). I'll send out a patch that
fixes this in a few minutes.
Good analysis work, Brian!
BTW, I think the underlying cause might be a different manifestation
of the race described in the comment above
xfs_inode_item_committed(), only this time with inodes that are
already in the AIL....
And FWIW, it doesn't explain the CIL stalls that seem to the other
cause of the problem when the AIL is empty...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2012-06-07 1:35 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-23 12:09 Still seeing hangs in xlog_grant_log_space Juerg Haefliger
2012-04-23 14:38 ` Dave Chinner
2012-04-23 15:33 ` Juerg Haefliger
2012-04-23 23:58 ` Dave Chinner
2012-04-24 8:55 ` Juerg Haefliger
2012-04-24 12:07 ` Dave Chinner
2012-04-24 18:26 ` Juerg Haefliger
2012-04-25 22:38 ` Dave Chinner
2012-04-26 12:37 ` Juerg Haefliger
2012-04-26 22:44 ` Dave Chinner
2012-04-26 23:00 ` Juerg Haefliger
2012-04-26 23:07 ` Dave Chinner
2012-04-27 9:04 ` Juerg Haefliger
2012-04-27 11:09 ` Dave Chinner
2012-04-27 13:07 ` Juerg Haefliger
2012-05-05 7:44 ` Juerg Haefliger
2012-05-07 17:19 ` Ben Myers
2012-05-09 7:54 ` Juerg Haefliger
2012-05-10 16:11 ` Chris J Arges
2012-05-10 21:53 ` Mark Tinguely
2012-05-16 18:42 ` Ben Myers
2012-05-16 19:03 ` Chris J Arges
2012-05-16 21:29 ` Mark Tinguely
2012-05-18 10:10 ` Dave Chinner
2012-05-18 14:42 ` Mark Tinguely
2012-05-22 22:59 ` Dave Chinner
2012-06-06 15:00 ` Chris J Arges
2012-06-07 0:49 ` Dave Chinner
2012-05-17 20:55 ` Chris J Arges
2012-05-18 16:53 ` Chris J Arges
2012-05-18 17:19 ` Ben Myers
2012-05-19 7:28 ` Juerg Haefliger
2012-05-21 17:11 ` Ben Myers
2012-05-24 5:45 ` Juerg Haefliger
2012-05-24 14:23 ` Ben Myers
2012-05-07 22:59 ` Dave Chinner
2012-05-09 7:35 ` Dave Chinner
2012-05-09 21:07 ` Mark Tinguely
2012-05-10 2:10 ` Mark Tinguely
2012-05-18 9:37 ` Dave Chinner
2012-05-18 9:31 ` Dave Chinner
2012-05-24 20:18 ` Peter Watkins
2012-05-25 6:28 ` Juerg Haefliger
2012-05-25 17:03 ` Peter Watkins
2012-06-05 23:54 ` Dave Chinner
2012-06-06 13:40 ` Brian Foster
2012-06-06 17:41 ` Mark Tinguely
2012-06-11 20:42 ` Chris J Arges
2012-06-11 23:53 ` Dave Chinner
2012-06-12 13:28 ` Chris J Arges
2012-06-06 22:03 ` Mark Tinguely
2012-06-06 23:04 ` Brian Foster
2012-06-07 1:35 ` Dave Chinner [this message]
2012-06-07 14:16 ` Brian Foster
2012-06-08 0:28 ` Dave Chinner
2012-06-08 17:09 ` Ben Myers
2012-06-11 20:59 ` Mark Tinguely
2012-06-05 15:21 ` Chris J Arges
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120607013531.GP22848@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox