From: Eric Sandeen <sandeen@sandeen.net>
To: Justin Piszcz <jpiszcz@lucidpixels.com>
Cc: linux-raid@vger.kernel.org, Alan Piszcz <ap@solarrain.com>,
linux-kernel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: Which kernel options should be enabled to find the root cause of this bug?
Date: Tue, 24 Nov 2009 09:14:46 -0600 [thread overview]
Message-ID: <4B0BF866.7040004@sandeen.net> (raw)
In-Reply-To: <alpine.DEB.2.00.0911240805490.25676@p34.internal.lan>
Justin Piszcz wrote:
>
>
> On Sat, 17 Oct 2009, Justin Piszcz wrote:
>
>> Hello,
>>
>> I have a system I recently upgraded from 2.6.30.x and after
>> approximately 24-48 hours--sometimes longer, the system cannot write
>> any more files to disk (luckily though I can still write to /dev/shm)
>> -- to which I have
>> saved the sysrq-t and sysrq-w output:
>>
>> http://home.comcast.net/~jpiszcz/20091017/sysrq-w.txt
>> http://home.comcast.net/~jpiszcz/20091017/sysrq-t.txt
Unfortunately it looks like a lot of the sysrq-t, at least, was lost.
The sysrq-w trace has the "show blocked state" start a ways down the file,
for anyone playing along at home ;)
Other things you might try are a sysrq-m to get memory state...
>> Configuration:
>>
>> $ cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md1
>> : active raid1 sdb2[1] sda2[0]
>> 136448 blocks [2/2] [UU]
>>
>> md2 : active raid1 sdb3[1] sda3[0]
>> 129596288 blocks [2/2] [UU]
>>
>> md3 : active raid5 sdj1[7] sdi1[6] sdh1[5] sdf1[3] sdg1[4] sde1[2]
>> sdd1[1] sdc1[0]
>> 5128001536 blocks level 5, 1024k chunk, algorithm 2 [8/8] [UUUUUUUU]
>>
>> md0 : active raid1 sdb1[1] sda1[0]
>> 16787776 blocks [2/2] [UU]
>>
>> $ mount
>> /dev/md2 on / type xfs (rw,noatime,nobarrier,logbufs=8,logbsize=262144)
>> tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
>> proc on /proc type proc (rw,noexec,nosuid,nodev)
>> sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
>> udev on /dev type tmpfs (rw,mode=0755)
>> tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
>> devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
>> /dev/md1 on /boot type ext3 (rw,noatime)
>> /dev/md3 on /r/1 type xfs
>> (rw,noatime,nobarrier,logbufs=8,logbsize=262144)
>> rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
>> nfsd on /proc/fs/nfsd type nfsd (rw)
Do you get the same behavior if you don't add the log options at mount time?
Kind of grasping at straws here for now ...
>> Distribution: Debian Testing
>> Arch: x86_64
>>
>> The problem occurs with 2.6.31 and I upgraded to 2.6.31.4 and the problem
>> persists.
>>
...
> In addition to using netconsole, which kernel options should be enabled
> to better diagnose this issue?
>
> Should I enable these to help track down this bug?
>
> [ ] XFS Debugging support (EXPERIMENTAL)
> [ ] Compile the kernel with frame pointers
The former probably won't hurt; the latter might gibe us better backtraces.
> Are there any other options that will help determine the root cause of this
> bug that are recommended?
Not that I can think of off hand ...
-Eric
> Justin.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Eric Sandeen <sandeen@sandeen.net>
To: Justin Piszcz <jpiszcz@lucidpixels.com>
Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org,
xfs@oss.sgi.com, Alan Piszcz <ap@solarrain.com>
Subject: Re: Which kernel options should be enabled to find the root cause of this bug?
Date: Tue, 24 Nov 2009 09:14:46 -0600 [thread overview]
Message-ID: <4B0BF866.7040004@sandeen.net> (raw)
In-Reply-To: <alpine.DEB.2.00.0911240805490.25676@p34.internal.lan>
Justin Piszcz wrote:
>
>
> On Sat, 17 Oct 2009, Justin Piszcz wrote:
>
>> Hello,
>>
>> I have a system I recently upgraded from 2.6.30.x and after
>> approximately 24-48 hours--sometimes longer, the system cannot write
>> any more files to disk (luckily though I can still write to /dev/shm)
>> -- to which I have
>> saved the sysrq-t and sysrq-w output:
>>
>> http://home.comcast.net/~jpiszcz/20091017/sysrq-w.txt
>> http://home.comcast.net/~jpiszcz/20091017/sysrq-t.txt
Unfortunately it looks like a lot of the sysrq-t, at least, was lost.
The sysrq-w trace has the "show blocked state" start a ways down the file,
for anyone playing along at home ;)
Other things you might try are a sysrq-m to get memory state...
>> Configuration:
>>
>> $ cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md1
>> : active raid1 sdb2[1] sda2[0]
>> 136448 blocks [2/2] [UU]
>>
>> md2 : active raid1 sdb3[1] sda3[0]
>> 129596288 blocks [2/2] [UU]
>>
>> md3 : active raid5 sdj1[7] sdi1[6] sdh1[5] sdf1[3] sdg1[4] sde1[2]
>> sdd1[1] sdc1[0]
>> 5128001536 blocks level 5, 1024k chunk, algorithm 2 [8/8] [UUUUUUUU]
>>
>> md0 : active raid1 sdb1[1] sda1[0]
>> 16787776 blocks [2/2] [UU]
>>
>> $ mount
>> /dev/md2 on / type xfs (rw,noatime,nobarrier,logbufs=8,logbsize=262144)
>> tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
>> proc on /proc type proc (rw,noexec,nosuid,nodev)
>> sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
>> udev on /dev type tmpfs (rw,mode=0755)
>> tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
>> devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
>> /dev/md1 on /boot type ext3 (rw,noatime)
>> /dev/md3 on /r/1 type xfs
>> (rw,noatime,nobarrier,logbufs=8,logbsize=262144)
>> rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
>> nfsd on /proc/fs/nfsd type nfsd (rw)
Do you get the same behavior if you don't add the log options at mount time?
Kind of grasping at straws here for now ...
>> Distribution: Debian Testing
>> Arch: x86_64
>>
>> The problem occurs with 2.6.31 and I upgraded to 2.6.31.4 and the problem
>> persists.
>>
...
> In addition to using netconsole, which kernel options should be enabled
> to better diagnose this issue?
>
> Should I enable these to help track down this bug?
>
> [ ] XFS Debugging support (EXPERIMENTAL)
> [ ] Compile the kernel with frame pointers
The former probably won't hurt; the latter might gibe us better backtraces.
> Are there any other options that will help determine the root cause of this
> bug that are recommended?
Not that I can think of off hand ...
-Eric
> Justin.
next prev parent reply other threads:[~2009-11-24 15:14 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-17 22:34 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) Justin Piszcz
2009-10-17 22:34 ` Justin Piszcz
2009-10-18 20:17 ` Justin Piszcz
2009-10-18 20:17 ` Justin Piszcz
2009-10-19 3:04 ` Dave Chinner
2009-10-19 3:04 ` Dave Chinner
2009-10-19 10:18 ` Justin Piszcz
2009-10-19 10:18 ` Justin Piszcz
2009-10-20 0:33 ` Dave Chinner
2009-10-20 0:33 ` Dave Chinner
2009-10-20 8:33 ` Justin Piszcz
2009-10-20 8:33 ` Justin Piszcz
2009-10-21 10:19 ` Justin Piszcz
2009-10-21 10:19 ` Justin Piszcz
2009-10-21 14:17 ` mdadm --detail showing annoying device Stephane Bunel
2009-10-21 21:46 ` Neil Brown
2009-10-22 11:22 ` Stephane Bunel
2009-10-29 3:44 ` Neil Brown
2009-11-03 9:37 ` Stephane Bunel
2009-11-03 10:09 ` Beolach
2009-11-03 12:16 ` Stephane Bunel
2009-10-22 11:29 ` Mario 'BitKoenig' Holbe
2009-10-22 14:17 ` Stephane Bunel
2009-10-22 16:00 ` Stephane Bunel
2009-10-22 22:49 ` 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) Justin Piszcz
2009-10-22 22:49 ` Justin Piszcz
2009-10-22 23:00 ` Dave Chinner
2009-10-22 23:00 ` Dave Chinner
2009-10-26 11:24 ` Justin Piszcz
2009-10-26 11:24 ` Justin Piszcz
2009-11-02 21:46 ` Justin Piszcz
2009-11-02 21:46 ` Justin Piszcz
2009-11-20 20:39 ` 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) - root cause found = asterisk Justin Piszcz
2009-11-20 20:39 ` Justin Piszcz
2009-11-20 23:44 ` Bug#557262: " Faidon Liambotis
2009-11-20 23:44 ` Faidon Liambotis
2009-11-20 23:44 ` Faidon Liambotis
2009-11-20 23:51 ` Justin Piszcz
2009-11-20 23:51 ` Justin Piszcz
2009-11-21 14:29 ` Roger Heflin
2009-11-21 14:29 ` Roger Heflin
2009-11-24 13:08 ` Which kernel options should be enabled to find the root cause of this bug? Justin Piszcz
2009-11-24 13:08 ` Justin Piszcz
2009-11-24 15:14 ` Eric Sandeen [this message]
2009-11-24 15:14 ` Eric Sandeen
2009-11-24 16:20 ` Justin Piszcz
2009-11-24 16:20 ` Justin Piszcz
2009-11-24 16:23 ` Eric Sandeen
2009-11-24 16:23 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B0BF866.7040004@sandeen.net \
--to=sandeen@sandeen.net \
--cc=ap@solarrain.com \
--cc=jpiszcz@lucidpixels.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.