* vs-3050: wait_buffer_until_released
@ 2003-02-18 17:40 Sebastian Kanthak
2003-02-19 6:41 ` Oleg Drokin
2003-02-19 8:22 ` Oleg Drokin
0 siblings, 2 replies; 5+ messages in thread
From: Sebastian Kanthak @ 2003-02-18 17:40 UTC (permalink / raw)
To: reiserfs-list
[-- Attachment #1: Type: text/plain, Size: 1558 bytes --]
Hi,
I'm using a vanilla 2.4.20 kernel with reiserfs on lvm. I can reproducably
crash the system by accessing the reiserfs-partition via samba. If I do this,
the following message appears in the kernel log and repeats every 5 or 10
seconds.
Feb 18 11:11:10 manticore kernel: vs-3050: wait_buffer_until_released: nobody
releases buffer (dev 3a:01, size 4096, blocknr 128036, count 2, list 0, state
0x10019, page c1108284, (UPTODATE, CLEAN, UNLOCKED)). Still waiting
(60000000) JDIRTY !JWAIT
I've dumped the mentioned block via debugreiserfs and attached it to this
email.
The only way to stop the machine is to hit the reset button, as every process
that accesses the reiserfs-partition freezes and cannot be killed, including
shutdown.
A reiserfsck scan gave the following results:
###########
reiserfsck --check started at Tue Feb 18 18:09:32 2003
###########
Replaying journal..
0 transactions replayed
Checking internal tree..finished
Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.
Checking Semantic tree:
finished
1 found corruptions can be fixed with --fix-fixable
###########
reiserfsck finished at Tue Feb 18 18:26:47 2003
###########
Should I fix the bitmap thing? Is it the reason or the consequence from the
above problem. We've changed to reiserfs one day ago, so I'm afraid that the
problem will reappear, even if I fix it with reiserfsck.
I've searched the mailing list archives and found others with the same
problem, however, no solution yet. Do you know the reason for the problem?
Sebastian
[-- Attachment #2: block.dat --]
[-- Type: text/plain, Size: 6172 bytes --]
<--------debugreiserfs 3.6.4, 2002-------->
128036 is free in ondisk bitmap
===================================================================
LEAF NODE (128036) contains level=1, nr_items=15, free_space=16 rdkey (real items 15)
-------------------------------------------------------------------------------
|###|type|ilen|f/sp| loc|fmt|fsck| key |
| | | |e/cn| | |need| |
-------------------------------------------------------------------------------
| 0|21317 21320 0x61 DRCT (2), len 176, location 3920 entry count 65535, fsck need 0, format new|
-------------------------------------------------------------------------------
| 1|21317 21321 0x0 SD (0), len 44, location 3876 entry count 65535, fsck need 0, format new|
(NEW SD), mode -rw-r--r--, size 375, nlink 1, mtime 05/18/1999 17:46:24 blocks 8, uid 1002
-------------------------------------------------------------------------------
| 2|21317 21321 0x1 DRCT (2), len 376, location 3500 entry count 65535, fsck need 0, format new|
-------------------------------------------------------------------------------
| 3|21317 21322 0x0 SD (0), len 44, location 3456 entry count 65535, fsck need 0, format new|
(NEW SD), mode drwxr-xr-x, size 320, nlink 10, mtime 02/15/2003 20:49:28 blocks 1, uid 1002
-------------------------------------------------------------------------------
| 4|21317 21322 0x1 DIR (3), len 320, location 3136 entry count 12, fsck need 0, format old|
###: Name length Object key Hash Gen number
0: ". "( 1) [21317 21322] 0 1, loc 312, state 4 not set
1: ".. "( 2) [3 21317] 0 2, loc 304, state 4 not set
2: "His6 "( 4) [21322 22291] 19401984 0, loc 296, state 4 "r5"
3: "Verlauf "( 7) [21322 22332] 43074944 0, loc 288, state 4 "r5"
4: "Recent "( 6) [21322 21361] 472559104 0, loc 280, state 4 "r5"
5: "USER.DAT "( 8) [21322 22330] 788601344 0, loc 272, state 4 "r5"
6: "Cookies "( 7) [21322 21941] 817404288 0, loc 264, state 4 "r5"
7: "Desktop "( 7) [21322 21323] 856471936 0, loc 256, state 4 "r5"
8: "Favoriten "( 9) [21322 22331] 1054739968 0, loc 240, state 4 "r5"
9: "Anwendungsdaten "( 15) [21322 21613] 1200321920 0, loc 224, state 4 "r5"
10: "Netzwerkumgebung "( 16) [21322 21555] 1733406080 0, loc 208, state 4 "r5"
11: "PROB005.TMP "( 11) [21322 22329] 1887205248 0, loc 192, state 4 "r5"
-------------------------------------------------------------------------------
| 5|21317 22378 0x0 SD (0), len 44, location 3092 entry count 65535, fsck need 0, format new|
(NEW SD), mode -rwx------, size 1474560, nlink 1, mtime 05/19/1999 19:25:52 blocks 2880, uid 1002
-------------------------------------------------------------------------------
| 6|21317 22378 0x1 IND (1), len 1440, location 1652 entry count 0, fsck need 0, format new|
360 pointers
[ 141297(360)]
-------------------------------------------------------------------------------
| 7|21317 22379 0x0 SD (0), len 44, location 1608 entry count 65535, fsck need 0, format new|
(NEW SD), mode drwxr-xr-x, size 112, nlink 3, mtime 01/05/2000 18:58:18 blocks 1, uid 1002
-------------------------------------------------------------------------------
| 8|21317 22379 0x1 DIR (3), len 112, location 1496 entry count 4, fsck need 0, format old|
###: Name length Object key Hash Gen number
0: ". "( 1) [21317 22379] 0 1, loc 104, state 4 not set
1: ".. "( 2) [3 21317] 0 2, loc 96, state 4 not set
2: "maps "( 4) [22379 22380] 27933440 0, loc 88, state 4 "r5"
3: "lambda_rotate.gif "( 17) [22379 22385] 1478681856 0, loc 64, state 4 "r5"
-------------------------------------------------------------------------------
| 9|21317 22386 0x0 SD (0), len 44, location 1452 entry count 65535, fsck need 0, format new|
(NEW SD), mode -rwx------, size 888053, nlink 1, mtime 08/15/2000 10:11:16 blocks 1736, uid 1002
-------------------------------------------------------------------------------
| 10|21317 22386 0x1 IND (1), len 868, location 584 entry count 0, fsck need 0, format new|
217 pointers
[ 140093(4) 140121(213)]
-------------------------------------------------------------------------------
| 11|21317 22387 0x0 SD (0), len 44, location 540 entry count 65535, fsck need 0, format new|
(NEW SD), mode drwx--x--x, size 48, nlink 2, mtime 08/07/2002 19:56:20 blocks 1, uid 1002
-------------------------------------------------------------------------------
| 12|21317 22387 0x1 DIR (3), len 48, location 492 entry count 2, fsck need 0, format old|
###: Name length Object key Hash Gen number
0: ". "( 1) [21317 22387] 0 1, loc 40, state 4 not set
1: ".. "( 2) [3 21317] 0 2, loc 32, state 4 not set
-------------------------------------------------------------------------------
| 13|21317 22388 0x0 SD (0), len 44, location 448 entry count 65535, fsck need 0, format new|
(NEW SD), mode drwx------, size 160, nlink 6, mtime 12/11/2000 20:32:27 blocks 1, uid 1002
-------------------------------------------------------------------------------
| 14|21317 22388 0x1 DIR (3), len 48, location 400 entry count 2, fsck need 0, format old|
###: Name length Object key Hash Gen number
0: ". "( 1) [21317 22388] 0 1, loc 40, state 4 not set
1: ".. "( 2) [3 21317] 0 2, loc 32, state 4 not set
===================================================================
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: vs-3050: wait_buffer_until_released
2003-02-18 17:40 vs-3050: wait_buffer_until_released Sebastian Kanthak
@ 2003-02-19 6:41 ` Oleg Drokin
2003-02-19 8:22 ` Oleg Drokin
1 sibling, 0 replies; 5+ messages in thread
From: Oleg Drokin @ 2003-02-19 6:41 UTC (permalink / raw)
To: Sebastian Kanthak; +Cc: reiserfs-list
Hello!
On Tue, Feb 18, 2003 at 06:40:47PM +0100, Sebastian Kanthak wrote:
> I'm using a vanilla 2.4.20 kernel with reiserfs on lvm. I can reproducably
> crash the system by accessing the reiserfs-partition via samba. If I do this,
Can you make the crash info available (oops or whatever you define as "crash").
> the following message appears in the kernel log and repeats every 5 or 10
> seconds.
> Feb 18 11:11:10 manticore kernel: vs-3050: wait_buffer_until_released: nobody
> releases buffer (dev 3a:01, size 4096, blocknr 128036, count 2, list 0, state
> 0x10019, page c1108284, (UPTODATE, CLEAN, UNLOCKED)). Still waiting
> (60000000) JDIRTY !JWAIT
Sure, if one of the threads crashed holding a buffer, other threads waiting for the
buffer would never succeed.
> The only way to stop the machine is to hit the reset button, as every process
> that accesses the reiserfs-partition freezes and cannot be killed, including
> shutdown.
This is kind of "that's how it works in linux if kernel crashed"
> ###########
> reiserfsck --check started at Tue Feb 18 18:09:32 2003
> ###########
> Replaying journal..
> 0 transactions replayed
> Checking internal tree..finished
> Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.
> Checking Semantic tree:
> finished
> 1 found corruptions can be fixed with --fix-fixable
> ###########
> reiserfsck finished at Tue Feb 18 18:26:47 2003
> ###########
> Should I fix the bitmap thing? Is it the reason or the consequence from the
You may or may not. If you won't, there will be some "lost" space on the filesystem.
> above problem. We've changed to reiserfs one day ago, so I'm afraid that the
> problem will reappear, even if I fix it with reiserfsck.
Sure, that's why we are interested in crash information, so that we can understand
it's source and fix the problem if it is reiserfs fault, or if it is the problem
that lies elsewhere, we will ping whoever is the responsible party.
> I've searched the mailing list archives and found others with the same
> problem, however, no solution yet. Do you know the reason for the problem?
Yes. The problem is that reiserfs is waiting on the buffer that is never released.
It is supposedly never released because you had a crash that killed the thread that
acquired the buffer, but have not released it yet.
Thank you.
Bye,
Oleg
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: vs-3050: wait_buffer_until_released
2003-02-18 17:40 vs-3050: wait_buffer_until_released Sebastian Kanthak
2003-02-19 6:41 ` Oleg Drokin
@ 2003-02-19 8:22 ` Oleg Drokin
2003-02-19 13:26 ` Sebastian Kanthak
1 sibling, 1 reply; 5+ messages in thread
From: Oleg Drokin @ 2003-02-19 8:22 UTC (permalink / raw)
To: Sebastian Kanthak; +Cc: reiserfs-list, vs
Hello!
On Tue, Feb 18, 2003 at 06:40:47PM +0100, Sebastian Kanthak wrote:
> I'm using a vanilla 2.4.20 kernel with reiserfs on lvm. I can reproducably
> crash the system by accessing the reiserfs-partition via samba. If I do this,
> the following message appears in the kernel log and repeats every 5 or 10
> seconds.
> Feb 18 11:11:10 manticore kernel: vs-3050: wait_buffer_until_released: nobody
> releases buffer (dev 3a:01, size 4096, blocknr 128036, count 2, list 0, state
> 0x10019, page c1108284, (UPTODATE, CLEAN, UNLOCKED)). Still waiting
> (60000000) JDIRTY !JWAIT
> I've dumped the mentioned block via debugreiserfs and attached it to this
> email.
Aha. More info for you.
If you mean the above message as the "crash", then there is different scenario,
that might be taking place.
There is a in-tree block that is marked as free in the bitmap (how did it get in
that state is still unclear). Now you use that block during tree traversals.
Then you want to allocate one more block. Bitmap allocator chooses the block
that is marked free and by accident it is the same block sthat is in fact used
already. We start to wait for buffer with the block to be released, but that's we
who hold the buffer in this case, so buffer is indeed never released.
You can fix this problem with reiserfsck --fix-fixable, just as it recommended.
Bye,
Oleg
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: vs-3050: wait_buffer_until_released
2003-02-19 8:22 ` Oleg Drokin
@ 2003-02-19 13:26 ` Sebastian Kanthak
2003-02-19 13:49 ` Oleg Drokin
0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Kanthak @ 2003-02-19 13:26 UTC (permalink / raw)
To: Oleg Drokin; +Cc: reiserfs-list
Hi Oleg,
thanks for your fast response.
On Wednesday 19 February 2003 09:22, Oleg Drokin wrote:
> On Tue, Feb 18, 2003 at 06:40:47PM +0100, Sebastian Kanthak wrote:
> > I'm using a vanilla 2.4.20 kernel with reiserfs on lvm. I can
> > reproducably crash the system by accessing the reiserfs-partition via
> > samba. If I do this, the following message appears in the kernel log and
> > repeats every 5 or 10 seconds.
> > Feb 18 11:11:10 manticore kernel: vs-3050: wait_buffer_until_released:
> > nobody releases buffer (dev 3a:01, size 4096, blocknr 128036, count 2,
> > list 0, state 0x10019, page c1108284, (UPTODATE, CLEAN, UNLOCKED)). Still
> > waiting (60000000) JDIRTY !JWAIT
> > I've dumped the mentioned block via debugreiserfs and attached it to this
> > email.
>
> Aha. More info for you.
> If you mean the above message as the "crash", then there is different
> scenario, that might be taking place.
Yes, that was what I called a "crash". It's not a kernel panic, so crash was
misleading indeed.
I haven't fixed the bitmap inconsistency yet, because I thought, you might
want some other info. In the meantime, the filesystem was not used very much
(only some simple cron-jobs or so), but I got new errors in my kernel log.
Output from dmesg is:
vs-4080: reiserfs_free_block: free_block (3a01:482474)[dev:blocknr]: bit
already cleared
vs-4080: reiserfs_free_block: free_block (3a01:482473)[dev:blocknr]: bit
already cleared
vs-4080: reiserfs_free_block: free_block (3a01:482472)[dev:blocknr]: bit
already cleared
vs-4080: reiserfs_free_block: free_block (3a01:482471)[dev:blocknr]: bit
already cleared
vs-4080: reiserfs_free_block: free_block (3a01:322183)[dev:blocknr]: bit
already cleared
vs-4080: reiserfs_free_block: free_block (3a01:322180)[dev:blocknr]: bit
already cleared
and a lot of similar lines. If I do a reiserfsck now, I get more errors than
the first time:
###########
reiserfsck --check started at Wed Feb 19 14:03:17 2003
###########
Replaying journal..
0 transactions replayed
Checking internal tree../ 1 (of 2)/ 61 (of 86)/ 81 (of
86)bad_indirect_item: block 315773: The item (46710 50807 0x1 IND (1), len 8,
location 1516 entry count 0, fsck need 0, format new) has the bad pointer (0)
to the block (315774), which is in tree already
bad_indirect_item: block 315773: The item (46710 50807 0x1 IND (1), len 8,
location 1516 entry count 0, fsck need 0, format new) has the bad pointer (1)
to the block (315775), which is in tree already
finished
Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.
Checking Semantic tree:
finished
4 found corruptions can be fixed with --fix-fixable
###########
reiserfsck finished at Wed Feb 19 14:20:09 2003
###########
Now, I'm wondering where these new errors came from. Are they a result of the
first error, or do they indicate a real problem? I will fix the above
problems with reiserfsck, but I'm afraid it will happen again, as I'm using
reiserfs on this machine for three days now. So I better find the reasons for
the problems. Do you have an idea, where my problems come from? I know now,
that threads are waiting for a in-tree buffer marked as free, but what's the
cause for this inconsistency? What's going on here? Can I provide you with
additional (useful) data somehow?
As reiserfs certainly does not contain so many bugs I'm starting to suspect my
hardware (I've read your FAQ about reiserfs running hotter). Memory seems to
be ok, though. Do you know a good tool to stress test the hard-disk and
report errors? I'll certainly need a way to reproduce the error in order to
locate it...
Thanks for your help so far.
Sebastian
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: vs-3050: wait_buffer_until_released
2003-02-19 13:26 ` Sebastian Kanthak
@ 2003-02-19 13:49 ` Oleg Drokin
0 siblings, 0 replies; 5+ messages in thread
From: Oleg Drokin @ 2003-02-19 13:49 UTC (permalink / raw)
To: Sebastian Kanthak; +Cc: reiserfs-list
Hello!
On Wed, Feb 19, 2003 at 02:26:22PM +0100, Sebastian Kanthak wrote:
> I haven't fixed the bitmap inconsistency yet, because I thought, you might
> want some other info. In the meantime, the filesystem was not used very much
> (only some simple cron-jobs or so), but I got new errors in my kernel log.
> Output from dmesg is:
> vs-4080: reiserfs_free_block: free_block (3a01:482474)[dev:blocknr]: bit
> already cleared
> vs-4080: reiserfs_free_block: free_block (3a01:482473)[dev:blocknr]: bit
> already cleared
> vs-4080: reiserfs_free_block: free_block (3a01:482472)[dev:blocknr]: bit
> already cleared
> vs-4080: reiserfs_free_block: free_block (3a01:482471)[dev:blocknr]: bit
> already cleared
> vs-4080: reiserfs_free_block: free_block (3a01:322183)[dev:blocknr]: bit
> already cleared
> vs-4080: reiserfs_free_block: free_block (3a01:322180)[dev:blocknr]: bit
Yes, these might happen too, if you have inconsistent bitmaps.
> and a lot of similar lines. If I do a reiserfsck now, I get more errors than
> the first time:
And these are kind of expected, too.
> Now, I'm wondering where these new errors came from. Are they a result of the
Used blocks that are marked free gets allocated again.
So noow you have same space used twice.
> problems with reiserfsck, but I'm afraid it will happen again, as I'm using
The interesting question is hod you got bitmap out of sync with reality.
> the problems. Do you have an idea, where my problems come from? I know now,
Have you already ruled out bad hardware?
> cause for this inconsistency? What's going on here? Can I provide you with
> additional (useful) data somehow?
Hm. If you can reproduce initial bitmap corruption somehow, this is would be very interesting.
> As reiserfs certainly does not contain so many bugs I'm starting to suspect my
> hardware (I've read your FAQ about reiserfs running hotter). Memory seems to
> be ok, though. Do you know a good tool to stress test the hard-disk and
> report errors? I'll certainly need a way to reproduce the error in order to
> locate it...
Hm. There was some.
E.g. Robin Miller´s Data Test Program
http://www.bit-net.com/~rmiller/dt.html
Bye,
Oleg
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2003-02-19 13:49 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-18 17:40 vs-3050: wait_buffer_until_released Sebastian Kanthak
2003-02-19 6:41 ` Oleg Drokin
2003-02-19 8:22 ` Oleg Drokin
2003-02-19 13:26 ` Sebastian Kanthak
2003-02-19 13:49 ` Oleg Drokin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.