The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* Problems with 2.4.2-pre1 & reiser & vfs
@ 2001-02-08 15:00 Andrius Adomaitis
  2001-02-08 22:37 ` Chris Mason
  0 siblings, 1 reply; 3+ messages in thread
From: Andrius Adomaitis @ 2001-02-08 15:00 UTC (permalink / raw)
  To: linux-kernel


Hello,

I have  dual PIII 800 machine running as mail server on DAC 960 RAID & 
reiserfs comming with 2.4.1kernel.

Under very high loads I get  following messages in my kernel log:

kernel: vs-13060: reiserfs_update_sd: stat data of object [7906789 
7906806 0x0 SD](nlink == 1) not found (pos 23)
kernel: vs-13060: reiserfs_update_sd: stat data of object [7906789 
7906806 0x0 SD] (nlink == 1) not found (pos 23)
kernel: PAP-5660: reiserfs_do_truncate: wrong result -1 of search for 
[7906789 7906806 0xfffffffffffffff DIRECT]
kernel: vs-13060: reiserfs_update_sd: stat data of object [7906789 
7906806 0x0 SD] (nlink == 1) not found (pos 23)
kernel: PAP-5660: reiserfs_do_truncate: wrong result -1 of search for 
[7906789 7906806 0xfffffffffffffff DIRECT]
.....

and afterwards come these:

kernel: vs-3050: wait_buffer_until_released: nobody releases buffer 
(dev 30:09, size 4096, blocknr 1661732, count 16,
kernel: vs-3050: wait_buffer_until_released: nobody releases buffer 
(dev 30:09, size 4096, blocknr 1661732, count 16,
...
and so on.

The interesting thing is that system is still operational, but load 
jumps up to 260 or so, and any attempts to reboot system fail. ps aux 
shows that there exists imortal (kill -9 $PID doesn't kill it) qmail 
process that consumes 97% of one CPU's resources.  Also `vmstat` shows 
tons of processes in uninterruptable sleep, but `free` reports that it 
is still enough memory (no swap used) and huge buffers... Machine gets 
slugish but works for a while (0.5-2h dependent on mail request rate).

System is Debian potato, 
gcc version 2.95.2 20000220 (Debian GNU/Linux),
reiserfs utils 3.6.25.

Any patches or suggestions to fix that would be appreciated...

P.S. Also I thought wouldn't it be good to have some sysctl entry in 
proc that rebooted machine dependent on the value in control file when 
proper software reboot is impossible (like in situation described 
above)? Or probably there already exist(s) such thing(s)?

Thanks.
-- 
Andrius
charta@gaumina.lt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Problems with 2.4.2-pre1 & reiser & vfs
  2001-02-08 15:00 Problems with 2.4.2-pre1 & reiser & vfs Andrius Adomaitis
@ 2001-02-08 22:37 ` Chris Mason
  2001-02-08 22:49   ` Alan Cox
  0 siblings, 1 reply; 3+ messages in thread
From: Chris Mason @ 2001-02-08 22:37 UTC (permalink / raw)
  To: Andrius Adomaitis, linux-kernel



On Thursday, February 08, 2001 04:00:26 PM +0100 Andrius Adomaitis <charta@gaumina.lt> wrote:

> 
> Hello,
> 
> I have  dual PIII 800 machine running as mail server on DAC 960 RAID & 
> reiserfs comming with 2.4.1kernel.
> 
> Under very high loads I get  following messages in my kernel log:
> 
> kernel: vs-13060: reiserfs_update_sd: stat data of object [7906789 
> 7906806 0x0 SD](nlink == 1) not found (pos 23)
> kernel: vs-13060: reiserfs_update_sd: stat data of object [7906789 
> 7906806 0x0 SD] (nlink == 1) not found (pos 23)
> kernel: PAP-5660: reiserfs_do_truncate: wrong result -1 of search for 
> [7906789 7906806 0xfffffffffffffff DIRECT]
> kernel: vs-13060: reiserfs_update_sd: stat data of object [7906789 
> 7906806 0x0 SD] (nlink == 1) not found (pos 23)
> kernel: PAP-5660: reiserfs_do_truncate: wrong result -1 of search for 
> [7906789 7906806 0xfffffffffffffff DIRECT]
> .....

These aren't good at all, and show a general corruption problem.  I know the ac kernels have at least one small DAC960 bug fixes, are there other fixes pending?

> 
> and afterwards come these:
> 
> kernel: vs-3050: wait_buffer_until_released: nobody releases buffer 
> (dev 30:09, size 4096, blocknr 1661732, count 16,
> kernel: vs-3050: wait_buffer_until_released: nobody releases buffer 
> (dev 30:09, size 4096, blocknr 1661732, count 16,
> ...
> and so on.
> 

There is more info in this message, it would help if you could send the entire line.

> The interesting thing is that system is still operational, but load 
> jumps up to 260 or so, and any attempts to reboot system fail. ps aux 
> shows that there exists imortal (kill -9 $PID doesn't kill it) qmail 
> process that consumes 97% of one CPU's resources.  Also `vmstat` shows 
> tons of processes in uninterruptable sleep, but `free` reports that it 
> is still enough memory (no swap used) and huge buffers... Machine gets 
> slugish but works for a while (0.5-2h dependent on mail request rate).
> 

Once you get a vs-3050, any process that tries to change the FS ends up waiting on the journal, which is waiting on the process stuck in vs-3050.  There is no escape.

-chris



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Problems with 2.4.2-pre1 & reiser & vfs
  2001-02-08 22:37 ` Chris Mason
@ 2001-02-08 22:49   ` Alan Cox
  0 siblings, 0 replies; 3+ messages in thread
From: Alan Cox @ 2001-02-08 22:49 UTC (permalink / raw)
  To: Chris Mason; +Cc: Andrius Adomaitis, linux-kernel

> > kernel: PAP-5660: reiserfs_do_truncate: wrong result -1 of search for 
> > [7906789 7906806 0xfffffffffffffff DIRECT]
> > .....
> 
> These aren't good at all, and show a general corruption problem.  I know the ac kernels have at least one small DAC960 bug fixes, are there other fixes pending?
> 

The dac960 changes relate to gcc 2.96 stuff and wouldnt account for real bugs.
DAC960 built with cvs gcc or 2.96 < 2.96-74 or so could do because of the ABI
thing but wouldnt boot that far. If its straight 2.4.1 suspect the elevator
corruption thing too

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2001-02-08 22:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-02-08 15:00 Problems with 2.4.2-pre1 & reiser & vfs Andrius Adomaitis
2001-02-08 22:37 ` Chris Mason
2001-02-08 22:49   ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox