public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Mark Tinguely <tinguely@sgi.com>
To: Chris J Arges <chris.j.arges@canonical.com>
Cc: linux-xfs@oss.sgi.com, Ben Myers <bpm@sgi.com>
Subject: Re: Still seeing hangs in xlog_grant_log_space
Date: Wed, 16 May 2012 16:29:01 -0500	[thread overview]
Message-ID: <4FB41C1D.8000808@sgi.com> (raw)
In-Reply-To: <4FB3FA1D.6050102@canonical.com>

On 05/16/12 14:03, Chris J Arges wrote:
> On 05/16/2012 01:42 PM, Ben Myers wrote:
>> Hey Chris,
>>
>> On Thu, May 10, 2012 at 04:11:27PM +0000, Chris J Arges wrote:
>>> <snip>
>>>> Canonical attached them to the bug report that they filed yesterday:
>>>> http://oss.sgi.com/bugzilla/show_bug.cgi?id=922
>>>>
>>>> ...Juerg
>>>>
>>>
>>> Hello,
>>> I am able to reproduce this bug with the instructions posted in this bug. Let me
>>> know what I can do to help.
>>
>> The bug shows:
>>
>> |This has been tested on the following kernels which all exhibit the same
>> |failures:
>> |- 3.2.0-24 (Ubuntu Precise)
>> |- 3.4.0-rc4
>> |- 3.0.29
>> |- 3.1.10
>> |- 3.2.15
>> |- 3.3.2
>>
>> Can you find an older kernel that isn't broken?
>>
>
> Sure, I can start digging further back.
> Also 2.6.38-8-server was the original version that this bug was reported
> on. So I can try testing circa 2.6.32 to see if that also fails.
> --chris
>
>> -Ben
>>
>

What I know so far:
I have log cleaner kicker added to xlog_grant_head_wake(). This kicker 
at best would prevent waiting for the next sync before starting the log 
cleaner.

I have one machine that has been running for 2 days without hanging. 
Actually, now I would prefer it to hurry up and hang.

Here is what see on the machine that is hung:

A few processes (4-5) are hung waiting to get space on the log. There 
isn't enough free space on the log for the first transaction and it 
waits. All other processes will have to wait behind the first process. 
251,811 bytes of the original 589,842 bytes should still be free (if my 
hand free space calculations are correct).

The AIL is empty. There is nothing to clean. Any new transaction at this 
point will kick the cleaner, and it still can't start the first waiter, 
so it joins the wait list.

The only XFS traffic at this point is inode reclaim worker. This is to 
be expected.

The CIL has entries, nothing is waiting on the CIL. xc_current_sequence 
= 117511 xc_push_seq = 117510. So there is nothing for the CIL worker to do.

117511 is the largest sequence number that I have found so far in the 
xfs_log_item list. There are a few entries with smaller sequence numbers 
and the following strange entry:

77th entry in the xfs_log_item list:

crash> struct xfs_log_item ffff88083222b5b8
struct xfs_log_item {
   li_ail = {
     next = 0xffff88083222b5b0,
     prev = 0x0
   },
   li_lsn = 0,
   li_desc = 0x9f5d9f5d,
   li_mountp = 0xffff88083116e300,
   li_ailp = 0x0,
   li_type = 0,
   li_flags = 0,
   li_bio_list = 0x0,
   li_cb = 0,
   li_ops = 0xffff88083105de00,
   li_cil = {
     next = 0xffff880832ad9f08,
     prev = 0xffff880831751448
   },
   li_lv = 0xc788c788,
   li_seq = -131906182637504
}

Everything in this entry is bad except the li_cil.next and li_cil.prev. 
It looks like li_ail.next is really part of a list that starts at 
0xffff88083222b5b0. The best explanation is a junk addresses was 
inserted into the li_cil chain.

This is a single data point which could be anything including bad 
hardware. I will continue to traverse this list until I can get the 
other box to hang. If someone want to traverse their xfs_log_item list ...

--Mark Tinguely.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2012-05-16 21:29 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-23 12:09 Still seeing hangs in xlog_grant_log_space Juerg Haefliger
2012-04-23 14:38 ` Dave Chinner
2012-04-23 15:33   ` Juerg Haefliger
2012-04-23 23:58     ` Dave Chinner
2012-04-24  8:55       ` Juerg Haefliger
2012-04-24 12:07         ` Dave Chinner
2012-04-24 18:26           ` Juerg Haefliger
2012-04-25 22:38             ` Dave Chinner
2012-04-26 12:37               ` Juerg Haefliger
2012-04-26 22:44                 ` Dave Chinner
2012-04-26 23:00                   ` Juerg Haefliger
2012-04-26 23:07                     ` Dave Chinner
2012-04-27  9:04                       ` Juerg Haefliger
2012-04-27 11:09                         ` Dave Chinner
2012-04-27 13:07                           ` Juerg Haefliger
2012-05-05  7:44                             ` Juerg Haefliger
2012-05-07 17:19                               ` Ben Myers
2012-05-09  7:54                                 ` Juerg Haefliger
2012-05-10 16:11                                   ` Chris J Arges
2012-05-10 21:53                                     ` Mark Tinguely
2012-05-16 18:42                                     ` Ben Myers
2012-05-16 19:03                                       ` Chris J Arges
2012-05-16 21:29                                         ` Mark Tinguely [this message]
2012-05-18 10:10                                           ` Dave Chinner
2012-05-18 14:42                                             ` Mark Tinguely
2012-05-22 22:59                                               ` Dave Chinner
2012-06-06 15:00                                             ` Chris J Arges
2012-06-07  0:49                                               ` Dave Chinner
2012-05-17 20:55                                       ` Chris J Arges
2012-05-18 16:53                                         ` Chris J Arges
2012-05-18 17:19                                   ` Ben Myers
2012-05-19  7:28                                     ` Juerg Haefliger
2012-05-21 17:11                                       ` Ben Myers
2012-05-24  5:45                                         ` Juerg Haefliger
2012-05-24 14:23                                           ` Ben Myers
2012-05-07 22:59                               ` Dave Chinner
2012-05-09  7:35                                 ` Dave Chinner
2012-05-09 21:07                                   ` Mark Tinguely
2012-05-10  2:10                                     ` Mark Tinguely
2012-05-18  9:37                                       ` Dave Chinner
2012-05-18  9:31                                     ` Dave Chinner
2012-05-24 20:18 ` Peter Watkins
2012-05-25  6:28   ` Juerg Haefliger
2012-05-25 17:03     ` Peter Watkins
2012-06-05 23:54       ` Dave Chinner
2012-06-06 13:40         ` Brian Foster
2012-06-06 17:41           ` Mark Tinguely
2012-06-11 20:42             ` Chris J Arges
2012-06-11 23:53               ` Dave Chinner
2012-06-12 13:28                 ` Chris J Arges
2012-06-06 22:03           ` Mark Tinguely
2012-06-06 23:04             ` Brian Foster
2012-06-07  1:35           ` Dave Chinner
2012-06-07 14:16             ` Brian Foster
2012-06-08  0:28               ` Dave Chinner
2012-06-08 17:09                 ` Ben Myers
2012-06-11 20:59         ` Mark Tinguely
2012-06-05 15:21   ` Chris J Arges

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FB41C1D.8000808@sgi.com \
    --to=tinguely@sgi.com \
    --cc=bpm@sgi.com \
    --cc=chris.j.arges@canonical.com \
    --cc=linux-xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox