public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* high load and xfsaild in d
@ 2012-11-23  2:13 Keith Keller
  2012-11-23  2:56 ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Keller @ 2012-11-23  2:13 UTC (permalink / raw)
  To: linux-xfs

Hello all,

I recently deployed a new CentOS 6.3 file server, and soon after I
noticed that the load was consistenly at around 4, even with no obvious
activity.  After checking around a bit, the only unusual thing I could
find is that the xfsaild threads are all consistently in D state:

root      1744  0.0  0.0      0     0 ?        D    Nov16   7:46 [xfsaild/dm-2]
root      1756  0.0  0.0      0     0 ?        D    Nov16   7:44 [xfsaild/dm-1]
root      1759  0.0  0.0      0     0 ?        D    Nov16   7:57 [xfsaild/dm-3]
root      1762  0.0  0.0      0     0 ?        D    Nov16   5:59 [xfsaild/dm-0]

On another CentOS 6.2 box, I don't see this symptom; the xfsaild threads
are all in S state.  I checked around for many other things that I'd
normally expect to be pegging the load, but didn't find anything out of
the ordinary.  (See
https://groups.google.com/group/comp.os.linux.misc/browse_thread/thread/c9b2a1d60eb25fac
for more details.)

The new box is an up to date CentOS 6.3 box with the stock kernel:

Linux xxxxxxxx 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

The storage backends are a software RAID1 (three of the filesystems) and
a 3ware 9750-backed 11-disk RAID6.  I don't have any unusual mount
options, and the filesystems were created with mkfs.xfs with no options.

I searched the web and the list archives, and did find some issues with
xfsaild, but they all either seemed out of date (one thread mentioned
that an issue would be fixed in 2.6.30) or with more complicated
symptoms (other processes hanging in D state).  But this server seems
perfectly fine from what I can tell; I've noticed no performance issues
either in informal observations or with actual measured read/write
speeds, and the smbd and nfsd processes are all generally in S, so they
are not waiting on IO.

The big question is, is this actually a problem, or is it nothing to
worry about?  If it's a problem, is it in XFS, or if not, what other
steps can I take to try to determine the root cause?

Thanks,

--keith

-- 
kkeller@wombat.san-francisco.ca.us


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: high load and xfsaild in d
  2012-11-23  2:13 high load and xfsaild in d Keith Keller
@ 2012-11-23  2:56 ` Dave Chinner
  2012-11-23  4:34   ` Keith Keller
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2012-11-23  2:56 UTC (permalink / raw)
  To: Keith Keller; +Cc: linux-xfs

On Thu, Nov 22, 2012 at 06:13:48PM -0800, Keith Keller wrote:
> Hello all,
> 
> I recently deployed a new CentOS 6.3 file server, and soon after I
> noticed that the load was consistenly at around 4, even with no obvious
> activity.  After checking around a bit, the only unusual thing I could
> find is that the xfsaild threads are all consistently in D state:
> 
> root      1744  0.0  0.0      0     0 ?        D    Nov16   7:46 [xfsaild/dm-2]
> root      1756  0.0  0.0      0     0 ?        D    Nov16   7:44 [xfsaild/dm-1]
> root      1759  0.0  0.0      0     0 ?        D    Nov16   7:57 [xfsaild/dm-3]
> root      1762  0.0  0.0      0     0 ?        D    Nov16   5:59 [xfsaild/dm-0]

That's a side effect of a minimal set of bug fixes that were needed
to avoid a load related log space hang. Those fixes disabled the
aild idling logic so the aild acts as a watchdog, so they wake up
every 50ms to check if there's anything to do. You'll find that
3.0.x stable kernels have the same behaviour.

The aild idling logic was re-enabled in mainstream kernels after the
root cause of the log space hangs was diagnosed and fixed, but I
can't see it ever being re-enabled in a CentOS 6.3 kernel....

Cheers,

Dave
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: high load and xfsaild in d
  2012-11-23  2:56 ` Dave Chinner
@ 2012-11-23  4:34   ` Keith Keller
  2012-11-23  5:39     ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Keller @ 2012-11-23  4:34 UTC (permalink / raw)
  To: linux-xfs

On 2012-11-23, Dave Chinner <david@fromorbit.com> wrote:
> That's a side effect of a minimal set of bug fixes that were needed
> to avoid a load related log space hang. Those fixes disabled the
> aild idling logic so the aild acts as a watchdog, so they wake up
> every 50ms to check if there's anything to do. You'll find that
> 3.0.x stable kernels have the same behaviour.
>
> The aild idling logic was re-enabled in mainstream kernels after the
> root cause of the log space hangs was diagnosed and fixed, but I
> can't see it ever being re-enabled in a CentOS 6.3 kernel....

Thanks for the clarification, Dave.  It sounds like I would need to
either wait for this fix to hit the CentOS release kernel, or compile my
own.  Is this likely to be an actual problem short-term, or should it
safe to put off any fixes for a few weeks?  If it's a potential problem
beyond just impacting the system load, I'd want to try to fix it sooner.

Do you know why I might not see this behavior on a different CentOS 6.x
kernel?

Linux xxxxxx 2.6.32-279.5.2.el6.x86_64 #1 SMP Fri Aug 24 01:07:11 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

This kernel has its one aild thread in state S.  Is the bug state an
issue if there's more than one XFS filesystem mounted?  Aside from nfsd
and smbd (and the storage backend), that's the only obvious difference
I can find between the two boxes.  Seems unlikely, but IIRC the original
bug was a very rare case.

--keith

-- 
kkeller@wombat.san-francisco.ca.us


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: high load and xfsaild in d
  2012-11-23  4:34   ` Keith Keller
@ 2012-11-23  5:39     ` Dave Chinner
  2012-11-23 20:19       ` Keith Keller
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2012-11-23  5:39 UTC (permalink / raw)
  To: Keith Keller; +Cc: linux-xfs

On Thu, Nov 22, 2012 at 08:34:06PM -0800, Keith Keller wrote:
> On 2012-11-23, Dave Chinner <david@fromorbit.com> wrote:
> > That's a side effect of a minimal set of bug fixes that were needed
> > to avoid a load related log space hang. Those fixes disabled the
> > aild idling logic so the aild acts as a watchdog, so they wake up
> > every 50ms to check if there's anything to do. You'll find that
> > 3.0.x stable kernels have the same behaviour.
> >
> > The aild idling logic was re-enabled in mainstream kernels after the
> > root cause of the log space hangs was diagnosed and fixed, but I
> > can't see it ever being re-enabled in a CentOS 6.3 kernel....
> 
> Thanks for the clarification, Dave.  It sounds like I would need to
> either wait for this fix to hit the CentOS release kernel, or compile my
> own.

Do whatever you want - you might be waiting a while for CentOS to
fix it, though, because they don't fix user reported bugs. They just
repackage whatever Red Hat releases as RHEL.

TANSTAAFL.

> Do you know why I might not see this behavior on a different CentOS 6.x
> kernel?
> 
> Linux xxxxxx 2.6.32-279.5.2.el6.x86_64 #1 SMP Fri Aug 24 01:07:11 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Because the log hang bug hadn't been fixed in that kernel.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: high load and xfsaild in d
  2012-11-23  5:39     ` Dave Chinner
@ 2012-11-23 20:19       ` Keith Keller
  2012-11-23 21:33         ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Keith Keller @ 2012-11-23 20:19 UTC (permalink / raw)
  To: linux-xfs

On 2012-11-23, Dave Chinner <david@fromorbit.com> wrote:
>
> Do whatever you want - you might be waiting a while for CentOS to
> fix it, though, because they don't fix user reported bugs. They just
> repackage whatever Red Hat releases as RHEL.

Yes, that's why I was asking--I was wondering whether it is safe to wait
for what could be some time for a) RHEL to decide to patch (if they do
so at all), b) RHEL to patch, and c) CentOS to patch.  IOW, is the high
load the only likely symptom of the originail aild patch, or are there
potentially other problems, such as performance degradation that I
haven't seen yet, that would make waiting for CentOS unwise?

>> Do you know why I might not see this behavior on a different CentOS 6.x
>> kernel?
>> 
>> Linux xxxxxx 2.6.32-279.5.2.el6.x86_64 #1 SMP Fri Aug 24 01:07:11 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> Because the log hang bug hadn't been fixed in that kernel.

This actually gives me some optimism that RHEL might introduce a new
kernel sooner rather than later--that kernel wasn't all that long ago,
and there have been quite a few (mostly unrelated) patches since.
(That's why I was so surprised--I'm not used to the RHEL kernel moving
so quickly!)

--keith

-- 
kkeller-usenet@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
see X- headers for PGP signature information


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: high load and xfsaild in d
  2012-11-23 20:19       ` Keith Keller
@ 2012-11-23 21:33         ` Dave Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2012-11-23 21:33 UTC (permalink / raw)
  To: Keith Keller; +Cc: linux-xfs

On Fri, Nov 23, 2012 at 12:19:59PM -0800, Keith Keller wrote:
> On 2012-11-23, Dave Chinner <david@fromorbit.com> wrote:
> >
> > Do whatever you want - you might be waiting a while for CentOS to
> > fix it, though, because they don't fix user reported bugs. They just
> > repackage whatever Red Hat releases as RHEL.
> 
> Yes, that's why I was asking--I was wondering whether it is safe to wait
> for what could be some time for a) RHEL to decide to patch (if they do
> so at all), b) RHEL to patch, and c) CentOS to patch.  IOW, is the high
> load the only likely symptom of the originail aild patch, or are there
> potentially other problems, such as performance degradation that I
> haven't seen yet, that would make waiting for CentOS unwise?

There is no side effect other than the load. There are not
performance issues with the ailds behaving like this.

> >> Do you know why I might not see this behavior on a different CentOS 6.x
> >> kernel?
> >> 
> >> Linux xxxxxx 2.6.32-279.5.2.el6.x86_64 #1 SMP Fri Aug 24 01:07:11 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
> >
> > Because the log hang bug hadn't been fixed in that kernel.
> 
> This actually gives me some optimism that RHEL might introduce a new
> kernel sooner rather than later--that kernel wasn't all that long ago,
> and there have been quite a few (mostly unrelated) patches since.

Doubt it. Given that I'm the RHEL XFS maintainer....

> (That's why I was so surprised--I'm not used to the RHEL kernel moving
> so quickly!)

What, you're not used to having serious bugs fixed quickly? That's
why people pay for RHEL...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-11-23 21:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-23  2:13 high load and xfsaild in d Keith Keller
2012-11-23  2:56 ` Dave Chinner
2012-11-23  4:34   ` Keith Keller
2012-11-23  5:39     ` Dave Chinner
2012-11-23 20:19       ` Keith Keller
2012-11-23 21:33         ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox