public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Mark Tinguely <tinguely@sgi.com>
Cc: linux-xfs@oss.sgi.com, Ben Myers <bpm@sgi.com>,
	Chris J Arges <chris.j.arges@canonical.com>
Subject: Re: Still seeing hangs in xlog_grant_log_space
Date: Wed, 23 May 2012 08:59:12 +1000	[thread overview]
Message-ID: <20120522225912.GI25351@dastard> (raw)
In-Reply-To: <4FB65FDD.3000500@sgi.com>

On Fri, May 18, 2012 at 09:42:37AM -0500, Mark Tinguely wrote:
> On 05/18/12 05:10, Dave Chinner wrote:
> >Still, this doesn't explain the hang at all - the CIL forms a new
> >list every time a checkpoint occurs, and this corruption would cause
> >a crash trying to walk the li_lv list when pushed. So it comes back
> >to why hasn't the CIL been pushed? what does the CIL context
> >structure look like?
> 
> The CIL context on the machine that was running 3+ days before hanging.
> 
> struct xfs_cil_ctx {
>   cil = 0xffff88034a8c5240,
>   sequence = 1241833,
>   start_lsn = 0,
>   commit_lsn = 0,
>   ticket = 0xffff88034e0ebc08,
>   nvecs = 237,
>   space_used = 39964,
>   busy_extents = {
>     next = 0xffff88034b287958,
>     prev = 0xffff88034d10c698
>   },
>   lv_chain = 0x0,
>   log_cb = {
>     cb_next = 0x0,
>     cb_func = 0,
>     cb_arg = 0x0
>   },
>   committing = {
>     next = 0xffff88034c84d120,
>     prev = 0xffff88034c84d120
>   }
> }

And the struct xfs_cil itself?

> Start the cleaning of the log when still full after last clean.
> ---
>  fs/xfs/xfs_log.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> Index: b/fs/xfs/xfs_log.c
> ===================================================================
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -191,8 +191,10 @@ xlog_grant_head_wake(
>  
>  	list_for_each_entry(tic, &head->waiters, t_queue) {
>  		need_bytes = xlog_ticket_reservation(log, head, tic);
> -		if (*free_bytes < need_bytes)
> +		if (*free_bytes < need_bytes) {
> +			xlog_grant_push_ail(log, need_bytes);

Ok, so that means every time the log tail is moved or a transaction
completes and returns unused space to the grant head, it pushes the
AIL target along.  But if we are hanging with an empty AIL, this is
not actually doing anything of note, just changing timing to make
whatever problem we have less common.  I'd remove this patch to make
reproducing the problem easier....

We've almost certainly got a CIL hang, and it looks like it is being
caused by an accounting leak. i.e.  if the CIL hasn't reached it's
push threshold (12.5% of the log space), but the AIL is empty and we
have the grant heads indicating that there is less than 25% of the
log space free, we are slowly leaking log space somewhere in the CIL
commit or checkpoint path.  Given that we've done 1.24 million
checkpoints in the above example, it's not a common thing. Given the
size of log, it may be related to log wrap commits, and it is also
worth noting that if this an accounting leak, it will eventually
result in a hard hang.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2012-05-22 22:59 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-23 12:09 Still seeing hangs in xlog_grant_log_space Juerg Haefliger
2012-04-23 14:38 ` Dave Chinner
2012-04-23 15:33   ` Juerg Haefliger
2012-04-23 23:58     ` Dave Chinner
2012-04-24  8:55       ` Juerg Haefliger
2012-04-24 12:07         ` Dave Chinner
2012-04-24 18:26           ` Juerg Haefliger
2012-04-25 22:38             ` Dave Chinner
2012-04-26 12:37               ` Juerg Haefliger
2012-04-26 22:44                 ` Dave Chinner
2012-04-26 23:00                   ` Juerg Haefliger
2012-04-26 23:07                     ` Dave Chinner
2012-04-27  9:04                       ` Juerg Haefliger
2012-04-27 11:09                         ` Dave Chinner
2012-04-27 13:07                           ` Juerg Haefliger
2012-05-05  7:44                             ` Juerg Haefliger
2012-05-07 17:19                               ` Ben Myers
2012-05-09  7:54                                 ` Juerg Haefliger
2012-05-10 16:11                                   ` Chris J Arges
2012-05-10 21:53                                     ` Mark Tinguely
2012-05-16 18:42                                     ` Ben Myers
2012-05-16 19:03                                       ` Chris J Arges
2012-05-16 21:29                                         ` Mark Tinguely
2012-05-18 10:10                                           ` Dave Chinner
2012-05-18 14:42                                             ` Mark Tinguely
2012-05-22 22:59                                               ` Dave Chinner [this message]
2012-06-06 15:00                                             ` Chris J Arges
2012-06-07  0:49                                               ` Dave Chinner
2012-05-17 20:55                                       ` Chris J Arges
2012-05-18 16:53                                         ` Chris J Arges
2012-05-18 17:19                                   ` Ben Myers
2012-05-19  7:28                                     ` Juerg Haefliger
2012-05-21 17:11                                       ` Ben Myers
2012-05-24  5:45                                         ` Juerg Haefliger
2012-05-24 14:23                                           ` Ben Myers
2012-05-07 22:59                               ` Dave Chinner
2012-05-09  7:35                                 ` Dave Chinner
2012-05-09 21:07                                   ` Mark Tinguely
2012-05-10  2:10                                     ` Mark Tinguely
2012-05-18  9:37                                       ` Dave Chinner
2012-05-18  9:31                                     ` Dave Chinner
2012-05-24 20:18 ` Peter Watkins
2012-05-25  6:28   ` Juerg Haefliger
2012-05-25 17:03     ` Peter Watkins
2012-06-05 23:54       ` Dave Chinner
2012-06-06 13:40         ` Brian Foster
2012-06-06 17:41           ` Mark Tinguely
2012-06-11 20:42             ` Chris J Arges
2012-06-11 23:53               ` Dave Chinner
2012-06-12 13:28                 ` Chris J Arges
2012-06-06 22:03           ` Mark Tinguely
2012-06-06 23:04             ` Brian Foster
2012-06-07  1:35           ` Dave Chinner
2012-06-07 14:16             ` Brian Foster
2012-06-08  0:28               ` Dave Chinner
2012-06-08 17:09                 ` Ben Myers
2012-06-11 20:59         ` Mark Tinguely
2012-06-05 15:21   ` Chris J Arges

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120522225912.GI25351@dastard \
    --to=david@fromorbit.com \
    --cc=bpm@sgi.com \
    --cc=chris.j.arges@canonical.com \
    --cc=linux-xfs@oss.sgi.com \
    --cc=tinguely@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox