public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Juerg Haefliger <juergh@gmail.com>
Cc: xfs@oss.sgi.com
Subject: Re: Still seeing hangs in xlog_grant_log_space
Date: Tue, 24 Apr 2012 00:38:43 +1000	[thread overview]
Message-ID: <20120423143843.GN9541@dastard> (raw)
In-Reply-To: <CADLDEKsP4DsXf_G07ub+a-ODbrJbsiprRJUX1fJdaQ41TB7+Xg@mail.gmail.com>

On Mon, Apr 23, 2012 at 02:09:53PM +0200, Juerg Haefliger wrote:
> Hi,
> 
> I have a test system that I'm using to try to force an XFS filesystem
> hang since we're encountering that problem sporadically in production
> running a 2.6.38-8 Natty kernel. The original idea was to use this
> system to find the patches that fix the issue but I've tried a whole
> bunch of kernels and they all hang eventually (anywhere from 5 to 45
> mins) with the stack trace shown below.

If you kill the workload, does the file system recover normally?

> Only an emergency flush will
> bring the filesystem back. I tried kernels 3.0.29, 3.1.10, 3.2.15,
> 3.3.2. From reading through the mail archives, I get the impression
> that this should be fixed in 3.1.

What you see is not necessarily a hang. It may just be that you've
caused your IO subsystem to have so much IO queued up it's completely
overwhelmed. How much RAM do you have in the machine?

> What makes the test system special is:
> 1) The test partition uses 1024 block size and 576b log size.

So you've made the log as physically small as possible on a tiny
(9GB) filesystem. Why?

> 2) The RAID controller cache is disabled.

And you've made the storage subsystem as slow as possible. What type
of RAID are you using, how many disks in the RAID volume, which type
of disks, etc?

> I can't seem to hit the problem without the above modifications.

How on earth did you come up with this configuration?

> For the IO workload I pre-create 8000 files with random content and
> sizes between 1k and 128k on the test partition. Then I run a tool
> that spawns a bunch of threads which just copy these files to a
> different directory on the same partition.

So, your workload also has a significant amount parallelism and
concurrency on a filesytsem with only 4 AGs? 

> At the same time there are
> other threads that rename, remove and overwrite random files in the
> destination directory keeping the file count at around 500.

And you've added as much concurrent metadata modification as
possible, too, which makes me wonder.....

> Let me know what other information I can provide to pin this down.

.... exactly what are you trying to acheive with this test?  From my
point of view, you're doing something completely and utterly insane.
You filesystem config and workload is so far outside normal
configurations and workloads that I'm not surprised you're seeing
some kind of problem.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2012-04-23 14:38 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-23 12:09 Still seeing hangs in xlog_grant_log_space Juerg Haefliger
2012-04-23 14:38 ` Dave Chinner [this message]
2012-04-23 15:33   ` Juerg Haefliger
2012-04-23 23:58     ` Dave Chinner
2012-04-24  8:55       ` Juerg Haefliger
2012-04-24 12:07         ` Dave Chinner
2012-04-24 18:26           ` Juerg Haefliger
2012-04-25 22:38             ` Dave Chinner
2012-04-26 12:37               ` Juerg Haefliger
2012-04-26 22:44                 ` Dave Chinner
2012-04-26 23:00                   ` Juerg Haefliger
2012-04-26 23:07                     ` Dave Chinner
2012-04-27  9:04                       ` Juerg Haefliger
2012-04-27 11:09                         ` Dave Chinner
2012-04-27 13:07                           ` Juerg Haefliger
2012-05-05  7:44                             ` Juerg Haefliger
2012-05-07 17:19                               ` Ben Myers
2012-05-09  7:54                                 ` Juerg Haefliger
2012-05-10 16:11                                   ` Chris J Arges
2012-05-10 21:53                                     ` Mark Tinguely
2012-05-16 18:42                                     ` Ben Myers
2012-05-16 19:03                                       ` Chris J Arges
2012-05-16 21:29                                         ` Mark Tinguely
2012-05-18 10:10                                           ` Dave Chinner
2012-05-18 14:42                                             ` Mark Tinguely
2012-05-22 22:59                                               ` Dave Chinner
2012-06-06 15:00                                             ` Chris J Arges
2012-06-07  0:49                                               ` Dave Chinner
2012-05-17 20:55                                       ` Chris J Arges
2012-05-18 16:53                                         ` Chris J Arges
2012-05-18 17:19                                   ` Ben Myers
2012-05-19  7:28                                     ` Juerg Haefliger
2012-05-21 17:11                                       ` Ben Myers
2012-05-24  5:45                                         ` Juerg Haefliger
2012-05-24 14:23                                           ` Ben Myers
2012-05-07 22:59                               ` Dave Chinner
2012-05-09  7:35                                 ` Dave Chinner
2012-05-09 21:07                                   ` Mark Tinguely
2012-05-10  2:10                                     ` Mark Tinguely
2012-05-18  9:37                                       ` Dave Chinner
2012-05-18  9:31                                     ` Dave Chinner
2012-05-24 20:18 ` Peter Watkins
2012-05-25  6:28   ` Juerg Haefliger
2012-05-25 17:03     ` Peter Watkins
2012-06-05 23:54       ` Dave Chinner
2012-06-06 13:40         ` Brian Foster
2012-06-06 17:41           ` Mark Tinguely
2012-06-11 20:42             ` Chris J Arges
2012-06-11 23:53               ` Dave Chinner
2012-06-12 13:28                 ` Chris J Arges
2012-06-06 22:03           ` Mark Tinguely
2012-06-06 23:04             ` Brian Foster
2012-06-07  1:35           ` Dave Chinner
2012-06-07 14:16             ` Brian Foster
2012-06-08  0:28               ` Dave Chinner
2012-06-08 17:09                 ` Ben Myers
2012-06-11 20:59         ` Mark Tinguely
2012-06-05 15:21   ` Chris J Arges

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120423143843.GN9541@dastard \
    --to=david@fromorbit.com \
    --cc=juergh@gmail.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox