* Re: ext3 data=journal hangs
@ 2007-01-12 5:34 Randy Dunlap
2007-01-12 6:58 ` Andrew Morton
0 siblings, 1 reply; 3+ messages in thread
From: Randy Dunlap @ 2007-01-12 5:34 UTC (permalink / raw)
To: linux-fsdevel
(resending for wider audience)
Date: Wed, 10 Jan 2007 16:03:51 -0800
To: linux-ext4@vger.kernel.org
On Tue, 9 Jan 2007 15:11:23 -0800 Randy Dunlap wrote:
> Hi,
>
> (2.6.20-rc4, x86_64 1-proc on SMP kernel, 1 GB RAM)
>
> I'm running fsx-linux (akpm ext3-tools version) on an ext3 fs
> with data=journal and fs blocksize=2048. I've been trying to
> get some kind of kernel messages from it but I can't get any
> debug IO done successfully.
>
> It has hung on me 3 times in a row today. I'm using this command:
> fsx-linux -l 100M -N 50000 -S 0 fsxtestfile
>
> This is run in a new partition on a IDE drive (/dev/hda7,
> using legacy IDE drivers).
>
> Any suggestions for debug output? I can see SysRq output on-screen
> (sometimes) but it doesn't make it to my serial console.
>
> Any patches to test? :)
More notes:
Fails (hangs) with fs blocksize of 1024, 2048, or 4096.
On data=journal mode hangs. writeback and ordered run fine.
After several runs (hangs), I was able to get some sysrq output
to the serial console.
kernel config: http://oss.oracle.com/~rdunlap/configs/config-2620-rc4-hangs
message log: http://oss.oracle.com/~rdunlap/logs/fsx-capture.txt
Can anyone see what fsx-linux is waiting on there?
Thanks,
---
~Randy
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: ext3 data=journal hangs
2007-01-12 5:34 ext3 data=journal hangs Randy Dunlap
@ 2007-01-12 6:58 ` Andrew Morton
2007-01-12 18:00 ` Randy Dunlap
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2007-01-12 6:58 UTC (permalink / raw)
To: Randy Dunlap; +Cc: linux-fsdevel, linux-mm
On Thu, 11 Jan 2007 21:34:12 -0800
Randy Dunlap <randy.dunlap@oracle.com> wrote:
> (resending for wider audience)
>
> Date: Wed, 10 Jan 2007 16:03:51 -0800
> To: linux-ext4@vger.kernel.org
>
>
> On Tue, 9 Jan 2007 15:11:23 -0800 Randy Dunlap wrote:
>
> > Hi,
> >
> > (2.6.20-rc4, x86_64 1-proc on SMP kernel, 1 GB RAM)
> >
> > I'm running fsx-linux (akpm ext3-tools version) on an ext3 fs
> > with data=journal and fs blocksize=2048. I've been trying to
> > get some kind of kernel messages from it but I can't get any
> > debug IO done successfully.
> >
> > It has hung on me 3 times in a row today. I'm using this command:
> > fsx-linux -l 100M -N 50000 -S 0 fsxtestfile
> >
> > This is run in a new partition on a IDE drive (/dev/hda7,
> > using legacy IDE drivers).
> >
> > Any suggestions for debug output? I can see SysRq output on-screen
> > (sometimes) but it doesn't make it to my serial console.
> >
> > Any patches to test? :)
>
> More notes:
> Fails (hangs) with fs blocksize of 1024, 2048, or 4096.
> On data=journal mode hangs. writeback and ordered run fine.
>
> After several runs (hangs), I was able to get some sysrq output
> to the serial console.
>
> kernel config: http://oss.oracle.com/~rdunlap/configs/config-2620-rc4-hangs
> message log: http://oss.oracle.com/~rdunlap/logs/fsx-capture.txt
>
> Can anyone see what fsx-linux is waiting on there?
>
Everybody got stuck in balance_dirty_pages(). The new thing in there is
that an nscd instance got stuck in balance_dirty_pages() on the pagefault's
new set_page_dirty_balance() path, so an mmap_sem is stuck, which causes
lots of other things to get stuck.
But I don't see why this should happen, really. It all seems OK here. Is
any IO happening at all?
You don't have any shells at all? If you do, try running /bin/sync,
see if the disk lights up. Run `watch -n1 cat /proc/meminfo' when testing
to see what dirty memory is doing. And `vmstat 1'. Try sysrq-S, see if
that gets things unstuck.
I guess it's consistent with the disk system losing its brains, too.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: ext3 data=journal hangs
2007-01-12 6:58 ` Andrew Morton
@ 2007-01-12 18:00 ` Randy Dunlap
0 siblings, 0 replies; 3+ messages in thread
From: Randy Dunlap @ 2007-01-12 18:00 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel, linux-mm
On Thu, 11 Jan 2007 22:58:48 -0800 Andrew Morton wrote:
> On Thu, 11 Jan 2007 21:34:12 -0800
> Randy Dunlap <randy.dunlap@oracle.com> wrote:
>
> > (resending for wider audience)
> >
> > Date: Wed, 10 Jan 2007 16:03:51 -0800
> > To: linux-ext4@vger.kernel.org
> >
> >
> > On Tue, 9 Jan 2007 15:11:23 -0800 Randy Dunlap wrote:
> >
> > > Hi,
> > >
> > > (2.6.20-rc4, x86_64 1-proc on SMP kernel, 1 GB RAM)
> > >
> > > I'm running fsx-linux (akpm ext3-tools version) on an ext3 fs
> > > with data=journal and fs blocksize=2048. I've been trying to
> > > get some kind of kernel messages from it but I can't get any
> > > debug IO done successfully.
> > >
> > > It has hung on me 3 times in a row today. I'm using this command:
> > > fsx-linux -l 100M -N 50000 -S 0 fsxtestfile
> > >
> > > This is run in a new partition on a IDE drive (/dev/hda7,
> > > using legacy IDE drivers).
> > >
> > > Any suggestions for debug output? I can see SysRq output on-screen
> > > (sometimes) but it doesn't make it to my serial console.
> > >
> > > Any patches to test? :)
> >
> > More notes:
> > Fails (hangs) with fs blocksize of 1024, 2048, or 4096.
> > On data=journal mode hangs. writeback and ordered run fine.
> >
> > After several runs (hangs), I was able to get some sysrq output
> > to the serial console.
> >
> > kernel config: http://oss.oracle.com/~rdunlap/configs/config-2620-rc4-hangs
> > message log: http://oss.oracle.com/~rdunlap/logs/fsx-capture.txt
> >
> > Can anyone see what fsx-linux is waiting on there?
> >
>
> Everybody got stuck in balance_dirty_pages(). The new thing in there is
> that an nscd instance got stuck in balance_dirty_pages() on the pagefault's
> new set_page_dirty_balance() path, so an mmap_sem is stuck, which causes
> lots of other things to get stuck.
>
> But I don't see why this should happen, really. It all seems OK here. Is
> any IO happening at all?
The disk IO LED blinks lightly 30 times in 60 seconds.
> You don't have any shells at all? If you do, try running /bin/sync,
> see if the disk lights up. Run `watch -n1 cat /proc/meminfo' when testing
> to see what dirty memory is doing. And `vmstat 1'. Try sysrq-S, see if
> that gets things unstuck.
/bin/sync or sysrq-S cause a little disk activity, not much.
Nothing comes unstuck AFAICT.
/proc/meminfo::Dirty began at around 400 MB, went down to around
25 MB, now it's back at 400 MB. I did 'vmstat 1 | tee vmstat.out'
and that IO is now hung also. :(
> I guess it's consistent with the disk system losing its brains, too.
It seems too reproducible for that, but maybe the entire thing is
scrogged. I'll see if some earlier kernels work for me.
---
~Randy
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-01-12 18:01 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-12 5:34 ext3 data=journal hangs Randy Dunlap
2007-01-12 6:58 ` Andrew Morton
2007-01-12 18:00 ` Randy Dunlap
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).