All of lore.kernel.org
 help / color / mirror / Atom feed
* reiser4 and megaraid problems with debian 2.6.5
@ 2004-04-14  6:51 Paul Wagland
  2004-04-14  9:05 ` Domenico Andreoli
  2004-04-14 15:13 ` reiser4 and megaraid problems with debian 2.6.5 Hans Reiser
  0 siblings, 2 replies; 20+ messages in thread
From: Paul Wagland @ 2004-04-14  6:51 UTC (permalink / raw)
  To: Linux mailing list SCSI, Linux mailing list kernel
  Cc: Hans Reiser, Atul Mukker

[-- Attachment #1: Type: text/plain, Size: 1406 bytes --]

Hi all,

I would like to report on a problem that I am having. I am just testing 
out the new megaraid unified driver, and have been doing some baseline 
testing with bonnie++.

My problem is that, although reiserfs, ext2, jfs and xfs all work, 
reiser4 fails with the following error:
---
Can't write block.
Bonnie: drastic I/O error (write(2)): No such file or directory
---

I am using the debian prepared kernel with the debian reiser4 patch. I 
made a cursory examination of the patch, and it appears to correlate 
fairly closely with the patch from the namesys site.

Given that this works with reiserfs, ext2, jfs and xfs it would appear 
to be a reiser4 problem, however ext3 also fails, though with a 
different error, it claims that the disk is full, but it is trying to 
write a 2 1GB files onto a 2.5GB filesystem, so it should have enough 
room, and indeed it did even work two or three times out of about 10 
runs (lots of timing :-). This implies that it might be a megaraid 
problem. As you can tell, I really have no idea ;-)

I will try playing around tonight with an official kernel and the 
official reiser4 patch to see if that makes any difference, but would 
just like to raise this potential problem sooner rather than later.

If I can help debug this situation (I am probably the only person 
trying this combination :-) please let me know how I should go about 
it.

Cheers,
Paul

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and megaraid problems with debian 2.6.5
  2004-04-14  6:51 reiser4 and megaraid problems with debian 2.6.5 Paul Wagland
@ 2004-04-14  9:05 ` Domenico Andreoli
  2004-04-14 12:36   ` Paul Wagland
  2004-04-14 15:13 ` reiser4 and megaraid problems with debian 2.6.5 Hans Reiser
  1 sibling, 1 reply; 20+ messages in thread
From: Domenico Andreoli @ 2004-04-14  9:05 UTC (permalink / raw)
  To: Paul Wagland
  Cc: Linux mailing list SCSI, Linux mailing list kernel, Hans Reiser,
	Atul Mukker, reiserfs-list

[ bringing this also on reiserfs ml, a great place for this kind
  of posts.  this is also the reason of the full quoting. sorry ]

On Wed, Apr 14, 2004 at 08:51:53AM +0200, Paul Wagland wrote:
> Hi all,
 
hi Paul,

> I would like to report on a problem that I am having. I am just testing 
> out the new megaraid unified driver, and have been doing some baseline 
> testing with bonnie++.
> 
> My problem is that, although reiserfs, ext2, jfs and xfs all work, 
> reiser4 fails with the following error:
> ---
> Can't write block.
> Bonnie: drastic I/O error (write(2)): No such file or directory
> ---
> 
> I am using the debian prepared kernel with the debian reiser4 patch. I 
> made a cursory examination of the patch, and it appears to correlate 
> fairly closely with the patch from the namesys site.
 
of course it is correlated to that of namesys! i have no skills at all
to invent reiser4 :))

you forgot to specify version of the patch you are talking about,
currently debian provides two versions. anyway i suppose you are talking
about version 20040326-2, aren't you?

> Given that this works with reiserfs, ext2, jfs and xfs it would appear 
> to be a reiser4 problem, however ext3 also fails, though with a 
> different error, it claims that the disk is full, but it is trying to 
> write a 2 1GB files onto a 2.5GB filesystem, so it should have enough 
> room, and indeed it did even work two or three times out of about 10 
> runs (lots of timing :-). This implies that it might be a megaraid 
> problem. As you can tell, I really have no idea ;-)
> 
> I will try playing around tonight with an official kernel and the 
> official reiser4 patch to see if that makes any difference, but would 
> just like to raise this potential problem sooner rather than later.
 
latest reiser4 snapshot provided a patch which applied cleanly on
2.6.5-rc2 but not to 2.6.5. i had to modify it as suggested on the
reiserfs ml. if you look at the debian package's changelog you can find
the reference to that thread.

> If I can help debug this situation (I am probably the only person 
> trying this combination :-) please let me know how I should go about 
> it.

i'm sorry but i can't help further.

cheers
domenico

-----[ Domenico Andreoli, aka cavok
 --[ http://filibusta.crema.unimi.it/~cavok/gpgkey.asc
   ---[ 3A0F 2F80 F79C 678A 8936  4FEE 0677 9033 A20E BC50

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and megaraid problems with debian 2.6.5
  2004-04-14  9:05 ` Domenico Andreoli
@ 2004-04-14 12:36   ` Paul Wagland
  2004-04-14 13:09     ` Nikita Danilov
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Wagland @ 2004-04-14 12:36 UTC (permalink / raw)
  To: Domenico Andreoli
  Cc: reiserfs-list, Linux mailing list SCSI, Atul Mukker, Hans Reiser,
	Linux mailing list kernel

[-- Attachment #1: Type: text/plain, Size: 1440 bytes --]


On Apr 14, 2004, at 11:05, Domenico Andreoli wrote:

> [ bringing this also on reiserfs ml, a great place for this kind
>   of posts.  this is also the reason of the full quoting. sorry ]

Thanks ;-)

>> I am using the debian prepared kernel with the debian reiser4 patch. I
>> made a cursory examination of the patch, and it appears to correlate
>> fairly closely with the patch from the namesys site.
>
> you forgot to specify version of the patch you are talking about,
> currently debian provides two versions. anyway i suppose you are 
> talking
> about version 20040326-2, aren't you?

Yes, that is correct.

>> If I can help debug this situation (I am probably the only person
>> trying this combination :-) please let me know how I should go about
>> it.
>
> i'm sorry but i can't help further.

Thanks for the tip... the link that you referred to was most useful. I 
might now have an idea what the problem might be... Further on in the 
thread <http://marc.theaimsgroup.com/?l=reiserfs&m=108117079808733&w=2> 
it says that there is something in the patch that "can lead to a 
dirtied_when in the future, and missed writeback". Well, what happens 
if the directory that I am missing was in that writeback that got 
missed?

I will try updating the debian patch myself and give it another test 
tonight and will report back on my findings. But, before I do so, does 
it seem likely that this could cause the problem?

Cheers,
Paul

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and megaraid problems with debian 2.6.5
  2004-04-14 12:36   ` Paul Wagland
@ 2004-04-14 13:09     ` Nikita Danilov
  2004-04-14 13:25       ` Paul Wagland
  0 siblings, 1 reply; 20+ messages in thread
From: Nikita Danilov @ 2004-04-14 13:09 UTC (permalink / raw)
  To: Paul Wagland
  Cc: Domenico Andreoli, reiserfs-list, Linux mailing list SCSI,
	Atul Mukker, Hans Reiser, Linux mailing list kernel

Paul Wagland writes:
 > 
 > On Apr 14, 2004, at 11:05, Domenico Andreoli wrote:
 > 
 > > [ bringing this also on reiserfs ml, a great place for this kind
 > >   of posts.  this is also the reason of the full quoting. sorry ]
 > 
 > Thanks ;-)
 > 
 > >> I am using the debian prepared kernel with the debian reiser4 patch. I
 > >> made a cursory examination of the patch, and it appears to correlate
 > >> fairly closely with the patch from the namesys site.
 > >
 > > you forgot to specify version of the patch you are talking about,
 > > currently debian provides two versions. anyway i suppose you are 
 > > talking
 > > about version 20040326-2, aren't you?
 > 
 > Yes, that is correct.
 > 
 > >> If I can help debug this situation (I am probably the only person
 > >> trying this combination :-) please let me know how I should go about
 > >> it.

Is there anything in the logs?

[...]

 > 
 > Cheers,
 > Paul

Nikita.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and megaraid problems with debian 2.6.5
  2004-04-14 13:09     ` Nikita Danilov
@ 2004-04-14 13:25       ` Paul Wagland
  2004-04-14 13:45         ` Vladimir Saveliev
  2004-04-14 23:59         ` Paul Wagland
  0 siblings, 2 replies; 20+ messages in thread
From: Paul Wagland @ 2004-04-14 13:25 UTC (permalink / raw)
  To: Nikita Danilov
  Cc: reiserfs-list, Linux mailing list SCSI, Atul Mukker,
	Domenico Andreoli, Hans Reiser, Linux mailing list kernel

[-- Attachment #1: Type: text/plain, Size: 416 bytes --]


On Apr 14, 2004, at 15:09, Nikita Danilov wrote:

>>> Paul Wagland writes:
>>>> If I can help debug this situation (I am probably the only person
>>>> trying this combination :-) please let me know how I should go about
>>>> it.
>
> Is there anything in the logs?

Sadly I forgot to check... though I will check again tonight since the 
problem is quite reproducible for me. Will report back later...

Cheers,
Paul

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and megaraid problems with debian 2.6.5
  2004-04-14 13:25       ` Paul Wagland
@ 2004-04-14 13:45         ` Vladimir Saveliev
  2004-04-14 14:03           ` Paul Wagland
  2004-04-14 23:59         ` Paul Wagland
  1 sibling, 1 reply; 20+ messages in thread
From: Vladimir Saveliev @ 2004-04-14 13:45 UTC (permalink / raw)
  To: Paul Wagland; +Cc: reiserfs-list

Hello

On Wed, 2004-04-14 at 17:25, Paul Wagland wrote:
> On Apr 14, 2004, at 15:09, Nikita Danilov wrote:
> 
> >>> Paul Wagland writes:
> >>>> If I can help debug this situation (I am probably the only person
> >>>> trying this combination :-) please let me know how I should go about
> >>>> it.
> >
> > Is there anything in the logs?
> 
> Sadly I forgot to check... though I will check again tonight since the 
> problem is quite reproducible for me. Will report 

Would you also mind to try on another device, please?

> Cheers,
> Paul


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and megaraid problems with debian 2.6.5
  2004-04-14 13:45         ` Vladimir Saveliev
@ 2004-04-14 14:03           ` Paul Wagland
  0 siblings, 0 replies; 20+ messages in thread
From: Paul Wagland @ 2004-04-14 14:03 UTC (permalink / raw)
  To: Vladimir Saveliev; +Cc: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 781 bytes --]

Hi Vladimir,

On Apr 14, 2004, at 15:45, Vladimir Saveliev wrote:

> On Wed, 2004-04-14 at 17:25, Paul Wagland wrote:
>> On Apr 14, 2004, at 15:09, Nikita Danilov wrote:
>>>
>>> Is there anything in the logs?
>>
>> Sadly I forgot to check... though I will check again tonight since the
>> problem is quite reproducible for me. Will report
>
> Would you also mind to try on another device, please?

Sadly, I can't, since I only have two machines that I can play with, 
one which is quite averse to 2.6 (not sure why, but it is going away 
soon so I haven't looked into it) and the one with the megaraid device, 
and this machine only has disks exported through that card. I can try 
it with the stock 2.6.5 driver, and will do so if the "Jiffies|1" fix 
doesn't help.

Cheers,
Paul

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and megaraid problems with debian 2.6.5
  2004-04-14  6:51 reiser4 and megaraid problems with debian 2.6.5 Paul Wagland
  2004-04-14  9:05 ` Domenico Andreoli
@ 2004-04-14 15:13 ` Hans Reiser
  2004-04-14 15:37   ` Paul Wagland
  1 sibling, 1 reply; 20+ messages in thread
From: Hans Reiser @ 2004-04-14 15:13 UTC (permalink / raw)
  To: Paul Wagland
  Cc: Linux mailing list SCSI, Linux mailing list kernel, Atul Mukker

Paul Wagland wrote:

> Hi all,
>
> I would like to report on a problem that I am having. I am just 
> testing out the new megaraid unified driver, and have been doing some 
> baseline testing with bonnie++.
>
> My problem is that, although reiserfs, ext2, jfs and xfs all work, 
> reiser4 fails with the following error:
> ---
> Can't write block.
> Bonnie: drastic I/O error (write(2)): No such file or directory
> ---
>
> I am using the debian prepared kernel with the debian reiser4 patch. I 
> made a cursory examination of the patch, and it appears to correlate 
> fairly closely with the patch from the namesys site.

In what way does it not correlate?

>
> Given that this works with reiserfs, ext2, jfs and xfs it would appear 
> to be a reiser4 problem, however ext3 also fails, though with a 
> different error, it claims that the disk is full, but it is trying to 
> write a 2 1GB files onto a 2.5GB filesystem, so it should have enough 
> room, and indeed it did even work two or three times out of about 10 
> runs (lots of timing :-). This implies that it might be a megaraid 
> problem. As you can tell, I really have no idea ;-)
>
> I will try playing around tonight with an official kernel and the 
> official reiser4 patch to see if that makes any difference, but would 
> just like to raise this potential problem sooner rather than later.
>
> If I can help debug this situation (I am probably the only person 
> trying this combination :-) please let me know how I should go about it.
>
> Cheers,
> Paul

I don't have the hardware to test it, can you get the error without your 
hardware?

-- 
Hans


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and megaraid problems with debian 2.6.5
  2004-04-14 15:13 ` reiser4 and megaraid problems with debian 2.6.5 Hans Reiser
@ 2004-04-14 15:37   ` Paul Wagland
  0 siblings, 0 replies; 20+ messages in thread
From: Paul Wagland @ 2004-04-14 15:37 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Linux mailing list SCSI, Linux mailing list kernel

[-- Attachment #1: Type: text/plain, Size: 1045 bytes --]

Hi,

On Apr 14, 2004, at 17:13, Hans Reiser wrote:

> Paul Wagland wrote:
>
>> I am using the debian prepared kernel with the debian reiser4 patch. 
>> I made a cursory examination of the patch, and it appears to 
>> correlate fairly closely with the patch from the namesys site.
>
> In what way does it not correlate?

As was mentioned by Domenico Andreoli the changes are just those 
required to get reiser4 to work under 2.6.5. Other differences are line 
offsets due to the fact that the debian kernel also has patches 
applied.

>> If I can help debug this situation (I am probably the only person 
>> trying this combination :-) please let me know how I should go about 
>> it.
>
> I don't have the hardware to test it, can you get the error without 
> your hardware?

Unfortunately, not easily, since this is the only box that I can 
currently test this out on. However, there a couple of tests that I can 
still perform (as mentioned elsewhere in this thread) and I will report 
back on the results of those later tonight.

Cheers,
Paul

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and megaraid problems with debian 2.6.5
  2004-04-14 13:25       ` Paul Wagland
  2004-04-14 13:45         ` Vladimir Saveliev
@ 2004-04-14 23:59         ` Paul Wagland
  2004-04-16 20:39           ` mjt
  2004-04-18 22:36           ` reiser4 and megaraid problems with debian 2.6.5 (*solved*) Paul Wagland
  1 sibling, 2 replies; 20+ messages in thread
From: Paul Wagland @ 2004-04-14 23:59 UTC (permalink / raw)
  To: Nikita Danilov
  Cc: reiserfs-list, Linux SCSI mailing list, Atul Mukker,
	Domenico Andreoli, Hans Reiser, Linux kernel mailing list

On Wed, 2004-04-14 at 15:25, Paul Wagland wrote:
> On Apr 14, 2004, at 15:09, Nikita Danilov wrote:
> 
> >>> Paul Wagland writes:
> >>>> If I can help debug this situation (I am probably the only person
> >>>> trying this combination :-) please let me know how I should go about
> >>>> it.
> >
> > Is there anything in the logs?
> 
> Sadly I forgot to check... though I will check again tonight since the 
> problem is quite reproducible for me. Will report back later...

OK. There is nothing in the logs. I have recompiled the kernel with
extra REISER4 debugging and checking and still nothing.

This error is 100% reproducible for me.

I have had a thought, what if it is "only" the wrong error code that is
being returned? What if the real problem is that we are running out of
free blocks. To test this theory (a little at least) I ran:

# bonnie++ -q -x4 -d /mnt/sdq -u 0:0 -f -r500
name,file_size,putc,putc_cpu,put_block,put_block_cpu,rewrite,rewrite_cpu,getc,getc_cpu,get_block,get_block_cpu,seeks,seeks_cpu,num_files,seq_create,seq_create_cpu,seq_stat,seq_stat_cpu,seq_del,seq_del_cpu,ran_create,ran_create_cpu,ran_stat,ran_stat_cpu,ran_del,ran_del_cpu
tidbit.kungfoocoder.org,1G,,,55236,11,36165,10,,,73514,8,2138.3,2,16,+++++,+++,+++++,+++,25015,99,28712,100,+++++,+++,26846,100
tidbit.kungfoocoder.org,1G,,,55236,11,30073,8,,,84287,10,2046.9,2,16,+++++,+++,+++++,+++,24862,99,28340,99,+++++,+++,26490,99
tidbit.kungfoocoder.org,1G,,,55391,11,30140,9,,,84506,10,2050.2,2,16,+++++,+++,+++++,+++,24642,100,28725,100,+++++,+++,26653,100
tidbit.kungfoocoder.org,1G,,,55364,11,30165,8,,,83055,11,2051.9,2,16,+++++,+++,+++++,+++,24682,100,28264,100,+++++,+++,26804,99


Note that even with debugging turned on we are about 5% faster at
reading and 20% slower than writing compared to reiserfs. Pretty good I
dare say.

However, when I run:

~# bonnie++ -x4 -d /mnt/sdq -u 0:0 -f -q -r800
name,file_size,putc,putc_cpu,put_block,put_block_cpu,rewrite,rewrite_cpu,getc,getc_cpu,get_block,get_block_cpu,seeks,seeks_cpu,num_files,seq_create,seq_create_cpu,seq_stat,seq_stat_cpu,seq_del,seq_del_cpu,ran_create,ran_create_cpu,ran_stat,ran_stat_cpu,ran_del,ran_del_cpu
Can't write block.
Bonnie: drastic I/O error (re write(2)): No such file or directory

Using reiserfs I can happily run:
# bonnie++ -x4 -d /mnt/sdq -u 0:0 -f -q -r1008

and the partition is 2.5GB in size.

Some more background information: my hardware is not overclocked, and
has been 100% reliable, about two weeks ago I sat it through about 24
hours of memtest86+ without any problems. The machine has 1GB of RAM.
The logical partition that I am testing is 2.5Gb

Here are the REISER4 settings from my configuration:
tidbit:~# grep REISER4 /boot/config-2.6.5pw-newmega-k7-1
CONFIG_REISER4_FS=m
# CONFIG_REISER4_FS_SYSCALL is not set
CONFIG_REISER4_LARGE_KEY=y
CONFIG_REISER4_CHECK=y
CONFIG_REISER4_FS_SYSCALL_DEBUG=y
# CONFIG_REISER4_DEBUG_MODIFY is not set
# CONFIG_REISER4_DEBUG_MEMCPY is not set
# CONFIG_REISER4_DEBUG_NODE is not set
# CONFIG_REISER4_ZERO_NEW_NODE is not set
# CONFIG_REISER4_TRACE is not set
# CONFIG_REISER4_EVENT_LOG is not set
# CONFIG_REISER4_STATS is not set
# CONFIG_REISER4_PROF is not set
# CONFIG_REISER4_LOCKPROF is not set
# CONFIG_REISER4_DEBUG_OUTPUT is not set
# CONFIG_REISER4_NOOPT is not set
CONFIG_REISER4_USE_EFLUSH=y
# CONFIG_REISER4_COPY_ON_CAPTURE is not set
# CONFIG_REISER4_BADBLOCKS is not set


I have removed the |1 from the jiffies|1 assignment. It still works,
which means that the kernel must have been fixed :-) But it didn't help
:-\

Hope this helps provide some illumination to the gurus out there...

Cheers,
Paul

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and megaraid problems with debian 2.6.5
  2004-04-14 23:59         ` Paul Wagland
@ 2004-04-16 20:39           ` mjt
  2004-04-17  7:38             ` Paul Wagland
  2004-04-18 22:36           ` reiser4 and megaraid problems with debian 2.6.5 (*solved*) Paul Wagland
  1 sibling, 1 reply; 20+ messages in thread
From: mjt @ 2004-04-16 20:39 UTC (permalink / raw)
  To: reiserfs-list

Paul Wagland wrote:

>Bonnie: drastic I/O error (re write(2)): No such file or directory

I applied http://mjt.nysv.org/reiser/bonnie.patch which may or may
not help. Nikita gave it on IRC, but without either of us having
time at the moment to do much about it (so it'll wait until Monday,
at least).

Guess the point of this email is that I won't forget the results I
got by then :)

Nothing was logged, although I had the following on in the kernel:
mjt@shrike:~$ zgrep REISER4 /proc/config.gz 
CONFIG_REISER4_FS=y
# CONFIG_REISER4_FS_SYSCALL is not set
CONFIG_REISER4_LARGE_KEY=y
CONFIG_REISER4_CHECK=y
CONFIG_REISER4_DEBUG=y
CONFIG_REISER4_FS_SYSCALL_DEBUG=y
# CONFIG_REISER4_DEBUG_MODIFY is not set
# CONFIG_REISER4_DEBUG_MEMCPY is not set
# CONFIG_REISER4_DEBUG_NODE is not set
# CONFIG_REISER4_ZERO_NEW_NODE is not set
# CONFIG_REISER4_TRACE is not set
# CONFIG_REISER4_EVENT_LOG is not set
# CONFIG_REISER4_STATS is not set
# CONFIG_REISER4_PROF is not set
# CONFIG_REISER4_LOCKPROF is not set
CONFIG_REISER4_DEBUG_OUTPUT=y
# CONFIG_REISER4_NOOPT is not set
CONFIG_REISER4_USE_EFLUSH=y
# CONFIG_REISER4_COPY_ON_CAPTURE is not set
# CONFIG_REISER4_BADBLOCKS is not set

I ran 
while [ 0 ]; do set $( df -m | tail -1); echo $4; sleep 1; done
on one terminal and
time /usr/sbin/bonnie++ -d bonnie/ -f -s $[ $( set $( df -m | tail -1); echo
$4 ) - 128 ]
on another.

The point is that df -m | tail -1 returns my home directory and 128 should
be how much space bonnie++ should leave for me while testing.

This is the output:
mjt@shrike:~/tmp$ time /usr/sbin/bonnie++ -d bonnie/ -f -s $[ $( set $( df -m | tail -1); echo $4 ) - 128 ]
Writing intelligently...
Message from syslogd@shrike at Fri Apr 16 19:33:14 2004 ...
shrike kernel: Disabling IRQ #10
done
Rewriting...Can't write block.
Bonnie: drastic I/O error (re write(2)): No such file or directory

real    39m19.082s
user    0m2.990s
sys     28m46.480s

And this is what the while [ 0 ] loop gave me:
82
63
43
126
123
110
90
127
53185
53185

This is naturally partial, there are many entries for almost fourty minutes,
which I did not log.

As you see, the free space drops well below 128, before the error when the
space is freed again. Is this because bonnie++ does not take -s too literally 
or what?

Also, the patch seemed to have no effect and I don't think I dare have it
in for nothing, not without talking to Nik or someone about it.

So, if this is a question of bonnie++ being imprecise, it's ok, but
Paul's problem is on a much larger scale. Hopefully we will talk about
this during the weekend on IRC, but this message should suffice as a
partial status report.

-- 
mjt


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and megaraid problems with debian 2.6.5
  2004-04-16 20:39           ` mjt
@ 2004-04-17  7:38             ` Paul Wagland
  2004-04-19 21:40               ` reiser4 and bonnie problems Paul Wagland
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Wagland @ 2004-04-17  7:38 UTC (permalink / raw)
  To: Markus TXrnqvist; +Cc: reiserfs-list

[-- Attachment #1: Type: text/plain, Size: 1846 bytes --]


On Apr 16, 2004, at 22:39, Markus TXrnqvist wrote:

> Paul Wagland wrote:
>
>> Bonnie: drastic I/O error (re write(2)): No such file or directory
>
> I applied http://mjt.nysv.org/reiser/bonnie.patch which may or may
> not help. Nikita gave it on IRC, but without either of us having
> time at the moment to do much about it (so it'll wait until Monday,
> at least).
>
> Guess the point of this email is that I won't forget the results I
> got by then :)

As such, I will also add my results to this thread... please note that 
I am not with my main machine at the moment, so can't post the numbers 
:-\

Anyway, I also have applied the above patch, and rerun my bonnie test, 
but this time on a 4GB partition.

I am still getting failures, but nothing is being printed in dmesg or 
/var/log/kern.log

What happens is that the character output works, and creates four 
files, 3x1Gb and 1x512MB. In another window I have a df loop running, 
and I can see that there is about 495MB free. So far so good :-) Now 
the "intelligent writing" starts, and after about 10 seconds it fails 
with the "drastic IO error...no such file or directory" error, looking 
at the df window I can see that when it fails there is 3.5GB free on 
that disk, I.e. about 480MB had been used. Since it is a one second 
loop in both cases then I am imagining that bonnie was able to write 
out 4GB of data.

What I think is happening in this case is that reiser4 is that although 
the old files have been deleted, and the online disk maps have been 
updated to show that there is enough free disk, the logs used to create 
those original files have not been freed. Does that make sense? Maybe 
it might help some of the more knowledgeable in determining what is 
happening?

Let me know if I can run any further tests to further determine what is 
happening.

Cheers,
Paul

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and megaraid problems with debian 2.6.5 (*solved*)
  2004-04-14 23:59         ` Paul Wagland
  2004-04-16 20:39           ` mjt
@ 2004-04-18 22:36           ` Paul Wagland
  1 sibling, 0 replies; 20+ messages in thread
From: Paul Wagland @ 2004-04-18 22:36 UTC (permalink / raw)
  To: Nikita Danilov
  Cc: reiserfs-list, Linux SCSI mailing list, Atul Mukker,
	Domenico Andreoli, Hans Reiser, Linux kernel mailing list

[-- Attachment #1: Type: text/plain, Size: 892 bytes --]

Hi all,

well partly solved anyway... I am just posting this so that if anyone
finds this thread later they can also find this conclusion... There is
still more work to be done before this problem can be properly closed,
but at least now I am certain that it has nothing to do with the
hardware :-)

It appears (my own unsupported theory) that the problem is that reiser4
is taking some time to free up the free blocks that are currently in use
by the wandering log. Since I was running a test that causes a lot of
wandering log to be created, and I was doing it on a filesystem with
very little free space, then I was running into the problem.

Rerunning the test with either a) more space, or b) a smaller data set
solved the problem. On the reiserfs-list we are now trying to find out
exactly why this is happening, and how to solve the problem properly.

Cheers,
Paul

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* reiser4 and bonnie problems
  2004-04-17  7:38             ` Paul Wagland
@ 2004-04-19 21:40               ` Paul Wagland
  2004-04-20  7:51                 ` Nikita Danilov
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Wagland @ 2004-04-19 21:40 UTC (permalink / raw)
  To: Paul Wagland; +Cc: reiserfs-list, Markus TXrnqvist

[-- Attachment #1: Type: text/plain, Size: 4759 bytes --]


On Apr 17, 2004, at 9:38, Paul Wagland wrote:

>
> On Apr 16, 2004, at 22:39, Markus TXrnqvist wrote:
>
>> Paul Wagland wrote:
>>
>>> Bonnie: drastic I/O error (re write(2)): No such file or directory
>>
>> I applied http://mjt.nysv.org/reiser/bonnie.patch which may or may
>> not help. Nikita gave it on IRC, but without either of us having
>> time at the moment to do much about it (so it'll wait until Monday,
>> at least).

OK. I have updated the above patch so that instead of only calling 
report_err() on ENOENT, it calls report_err whenever an error happens. 
I then re-ran my test:

tidbit:~# df /mnt/sdr
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdr1              3984228       296   3983932   1% /mnt/sdr
tidbit:~# bonnie++ -r 1736 -d /mnt/sdr -u 0:0
Using uid:0, gid:0.
Writing with putc()...done
Writing intelligently...Can't write block.
Bonnie: drastic I/O error (write(2)): No space left on device


Note, that bonnie tries to create 3.5 GB worth of files on a 4GB 
filesystem... this should not be causing a problem. Also, the "Writing 
intelligently" test tries to write out the same amount of information 
as the "Writing with putc()" test. After each test all the files from 
that test are deleted, so this really should not be causing a problem. 
Further, while running the test, I was also running a df loop, as can 
be seen below:

tidbit:~# while true; do sleep 1; echo `df /mnt/sdr | tail -1` `du -s 
/mnt/sdr`; done
/dev/sdr1 3984228 3496804 487424 88% /mnt/sdr 3499521 /mnt/sdr
/dev/sdr1 3984228 3523684 460544 89% /mnt/sdr 3523393 /mnt/sdr
/dev/sdr1 3984228 3547620 436608 90% /mnt/sdr 3547329 /mnt/sdr
/dev/sdr1 3984228 410136 3574092 11% /mnt/sdr 1 /mnt/sdr
   < more or less here is where the intelligent write starts >
/dev/sdr1 3984228 84768 3899460 3% /mnt/sdr 84417 /mnt/sdr
/dev/sdr1 3984228 164624 3819604 5% /mnt/sdr 164273 /mnt/sdr
/dev/sdr1 3984228 246664 3737564 7% /mnt/sdr 246309 /mnt/sdr
/dev/sdr1 3984228 321952 3662276 9% /mnt/sdr 321601 /mnt/sdr
/dev/sdr1 3984228 404832 3579396 11% /mnt/sdr 404481 /mnt/sdr
/dev/sdr1 3984228 477264 3506964 12% /mnt/sdr 476913 /mnt/sdr
/dev/sdr1 3984228 527404 3456824 14% /mnt/sdr 1 /mnt/sdr
   < there is quite a long pause here, around about 5 seconds >
   < and in the meantime, bonnie has failed with a "no space left on 
device" >
/dev/sdr1 3984228 296 3983932 1% /mnt/sdr 1 /mnt/sdr


With the modifications that I made above, the following lines are spat 
out into the /var/log/kern.log:

Apr 19 23:20:44 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
Apr 19 23:20:57 tidbit last message repeated 3 times
Apr 19 23:21:02 tidbit kernel: code: -503 at fs/reiser4/seal.c:170
Apr 19 23:21:02 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
Apr 19 23:21:17 tidbit last message repeated 3 times
Apr 19 23:21:22 tidbit kernel: code: -503 at fs/reiser4/seal.c:170
Apr 19 23:21:22 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
Apr 19 23:21:37 tidbit last message repeated 4 times
Apr 19 23:21:42 tidbit kernel: code: -503 at fs/reiser4/seal.c:181
Apr 19 23:21:42 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
Apr 19 23:21:57 tidbit last message repeated 3 times
Apr 19 23:22:02 tidbit kernel: code: -503 at 
fs/reiser4/plugin/file/file.c:791
Apr 19 23:22:02 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
Apr 19 23:22:32 tidbit last message repeated 7 times
Apr 19 23:22:51 tidbit last message repeated 4 times
Apr 19 23:22:56 tidbit kernel: code: -503 at 
fs/reiser4/plugin/file/file.c:791
Apr 19 23:22:56 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
Apr 19 23:23:12 tidbit last message repeated 4 times
Apr 19 23:23:16 tidbit kernel: code: -503 at 
fs/reiser4/plugin/file/file.c:791
Apr 19 23:23:16 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
Apr 19 23:23:30 tidbit last message repeated 5 times
Apr 19 23:23:30 tidbit kernel: code: -503 at 
fs/reiser4/plugin/file/file.c:791
Apr 19 23:23:30 tidbit kernel: code: -503 at 
fs/reiser4/plugin/file/file.c:791
Apr 19 23:23:30 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
Apr 19 23:23:32 tidbit kernel: code: -503 at 
fs/reiser4/plugin/file/file.c:791
Apr 19 23:23:32 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
Apr 19 23:23:34 tidbit kernel: code: -503 at fs/reiser4/seal.c:170
Apr 19 23:23:34 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
Apr 19 23:23:36 tidbit kernel: code: -503 at fs/reiser4/seal.c:170
Apr 19 23:23:36 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
Apr 19 23:23:37 tidbit kernel: code: -28 at fs/reiser4/block_alloc.c:319


I really hope that this helps someone out there who knows more about 
the code internals than I. As always, if you would like me to run some 
more tests, please let me know!

Cheers,
Paul

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and bonnie problems
  2004-04-19 21:40               ` reiser4 and bonnie problems Paul Wagland
@ 2004-04-20  7:51                 ` Nikita Danilov
  2004-04-20  8:54                   ` Paul Wagland
  0 siblings, 1 reply; 20+ messages in thread
From: Nikita Danilov @ 2004-04-20  7:51 UTC (permalink / raw)
  To: Paul Wagland; +Cc: reiserfs-list, Markus TXrnqvist

Paul Wagland writes:
 > 
 > On Apr 17, 2004, at 9:38, Paul Wagland wrote:
 > 
 > >
 > > On Apr 16, 2004, at 22:39, Markus TXrnqvist wrote:
 > >
 > >> Paul Wagland wrote:
 > >>
 > >>> Bonnie: drastic I/O error (re write(2)): No such file or directory
 > >>
 > >> I applied http://mjt.nysv.org/reiser/bonnie.patch which may or may
 > >> not help. Nikita gave it on IRC, but without either of us having
 > >> time at the moment to do much about it (so it'll wait until Monday,
 > >> at least).
 > 
 > OK. I have updated the above patch so that instead of only calling 
 > report_err() on ENOENT, it calls report_err whenever an error happens. 
 > I then re-ran my test:
 > 
 > tidbit:~# df /mnt/sdr
 > Filesystem           1K-blocks      Used Available Use% Mounted on
 > /dev/sdr1              3984228       296   3983932   1% /mnt/sdr
 > tidbit:~# bonnie++ -r 1736 -d /mnt/sdr -u 0:0
 > Using uid:0, gid:0.
 > Writing with putc()...done
 > Writing intelligently...Can't write block.
 > Bonnie: drastic I/O error (write(2)): No space left on device

Err.. this is completely different from what we had previously: "No
space left on device" is understandable where not reasonable. What is
completely mysterious is "No such file or directory" error that you
reported before.

 > 
 > 
 > Note, that bonnie tries to create 3.5 GB worth of files on a 4GB 
 > filesystem... this should not be causing a problem. Also, the "Writing 
 > intelligently" test tries to write out the same amount of information 
 > as the "Writing with putc()" test. After each test all the files from 
 > that test are deleted, so this really should not be causing a problem. 
 > Further, while running the test, I was also running a df loop, as can 
 > be seen below:
 > 
 > tidbit:~# while true; do sleep 1; echo `df /mnt/sdr | tail -1` `du -s 
 > /mnt/sdr`; done
 > /dev/sdr1 3984228 3496804 487424 88% /mnt/sdr 3499521 /mnt/sdr
 > /dev/sdr1 3984228 3523684 460544 89% /mnt/sdr 3523393 /mnt/sdr
 > /dev/sdr1 3984228 3547620 436608 90% /mnt/sdr 3547329 /mnt/sdr
 > /dev/sdr1 3984228 410136 3574092 11% /mnt/sdr 1 /mnt/sdr
 >    < more or less here is where the intelligent write starts >
 > /dev/sdr1 3984228 84768 3899460 3% /mnt/sdr 84417 /mnt/sdr
 > /dev/sdr1 3984228 164624 3819604 5% /mnt/sdr 164273 /mnt/sdr
 > /dev/sdr1 3984228 246664 3737564 7% /mnt/sdr 246309 /mnt/sdr
 > /dev/sdr1 3984228 321952 3662276 9% /mnt/sdr 321601 /mnt/sdr
 > /dev/sdr1 3984228 404832 3579396 11% /mnt/sdr 404481 /mnt/sdr
 > /dev/sdr1 3984228 477264 3506964 12% /mnt/sdr 476913 /mnt/sdr
 > /dev/sdr1 3984228 527404 3456824 14% /mnt/sdr 1 /mnt/sdr
 >    < there is quite a long pause here, around about 5 seconds >
 >    < and in the meantime, bonnie has failed with a "no space left on 
 > device" >
 > /dev/sdr1 3984228 296 3983932 1% /mnt/sdr 1 /mnt/sdr
 > 
 > 
 > With the modifications that I made above, the following lines are spat 
 > out into the /var/log/kern.log:
 > 
 > Apr 19 23:20:44 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
 > Apr 19 23:20:57 tidbit last message repeated 3 times
 > Apr 19 23:21:02 tidbit kernel: code: -503 at fs/reiser4/seal.c:170
 > Apr 19 23:21:02 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
 > Apr 19 23:21:17 tidbit last message repeated 3 times
 > Apr 19 23:21:22 tidbit kernel: code: -503 at fs/reiser4/seal.c:170
 > Apr 19 23:21:22 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
 > Apr 19 23:21:37 tidbit last message repeated 4 times
 > Apr 19 23:21:42 tidbit kernel: code: -503 at fs/reiser4/seal.c:181
 > Apr 19 23:21:42 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
 > Apr 19 23:21:57 tidbit last message repeated 3 times
 > Apr 19 23:22:02 tidbit kernel: code: -503 at 
 > fs/reiser4/plugin/file/file.c:791
 > Apr 19 23:22:02 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
 > Apr 19 23:22:32 tidbit last message repeated 7 times
 > Apr 19 23:22:51 tidbit last message repeated 4 times
 > Apr 19 23:22:56 tidbit kernel: code: -503 at 
 > fs/reiser4/plugin/file/file.c:791
 > Apr 19 23:22:56 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
 > Apr 19 23:23:12 tidbit last message repeated 4 times
 > Apr 19 23:23:16 tidbit kernel: code: -503 at 
 > fs/reiser4/plugin/file/file.c:791
 > Apr 19 23:23:16 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
 > Apr 19 23:23:30 tidbit last message repeated 5 times
 > Apr 19 23:23:30 tidbit kernel: code: -503 at 
 > fs/reiser4/plugin/file/file.c:791
 > Apr 19 23:23:30 tidbit kernel: code: -503 at 
 > fs/reiser4/plugin/file/file.c:791
 > Apr 19 23:23:30 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
 > Apr 19 23:23:32 tidbit kernel: code: -503 at 
 > fs/reiser4/plugin/file/file.c:791
 > Apr 19 23:23:32 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
 > Apr 19 23:23:34 tidbit kernel: code: -503 at fs/reiser4/seal.c:170
 > Apr 19 23:23:34 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
 > Apr 19 23:23:36 tidbit kernel: code: -503 at fs/reiser4/seal.c:170
 > Apr 19 23:23:36 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
 > Apr 19 23:23:37 tidbit kernel: code: -28 at fs/reiser4/block_alloc.c:319

That looks ok, -28 == -ENOSPC.

 > 
 > 
 > I really hope that this helps someone out there who knows more about 
 > the code internals than I. As always, if you would like me to run some 
 > more tests, please let me know!
 > 
 > Cheers,
 > Paul

Nikita.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and bonnie problems
  2004-04-20  7:51                 ` Nikita Danilov
@ 2004-04-20  8:54                   ` Paul Wagland
  2004-04-20 13:37                     ` Nikita Danilov
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Wagland @ 2004-04-20  8:54 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: reiserfs-list, Markus TXrnqvist

[-- Attachment #1: Type: text/plain, Size: 5021 bytes --]


On Apr 20, 2004, at 9:51, Nikita Danilov wrote:

> Paul Wagland writes:
>> Bonnie: drastic I/O error (write(2)): No space left on device
>
> Err.. this is completely different from what we had previously: "No
> space left on device" is understandable where not reasonable. What is
> completely mysterious is "No such file or directory" error that you
> reported before.

Yes, last night I could not get that "No such file or directory" error 
to appear. I will try to reproduce that particular error again later 
tonight. However, this other problem happens at exactly the same time, 
and is an error in the reiser4.

Just to summarise everything that is written below, this is what bonnie 
is doing:
1. write 3.5 GB onto a 4GB partition. This works
2. delete 3.5 GB from a 4GB partition. This works.
3. I check using df that the disk space is free. That works.
4. write 3.5GB onto a 4GB partition. This fails.

At the time of failure, according to df there is stiff 3.5 GB free. So, 
I say that reiser4 has a problem.

>> Note, that bonnie tries to create 3.5 GB worth of files on a 4GB
>> filesystem... this should not be causing a problem. Also, the "Writing
>> intelligently" test tries to write out the same amount of information
>> as the "Writing with putc()" test. After each test all the files from
>> that test are deleted, so this really should not be causing a problem.
>> Further, while running the test, I was also running a df loop, as can
>> be seen below:
>>
>> tidbit:~# while true; do sleep 1; echo `df /mnt/sdr | tail -1` `du -s
>> /mnt/sdr`; done
>> /dev/sdr1 3984228 3496804 487424 88% /mnt/sdr 3499521 /mnt/sdr
>> /dev/sdr1 3984228 3523684 460544 89% /mnt/sdr 3523393 /mnt/sdr
>> /dev/sdr1 3984228 3547620 436608 90% /mnt/sdr 3547329 /mnt/sdr
>> /dev/sdr1 3984228 410136 3574092 11% /mnt/sdr 1 /mnt/sdr
>>    < more or less here is where the intelligent write starts >
>> /dev/sdr1 3984228 84768 3899460 3% /mnt/sdr 84417 /mnt/sdr
>> /dev/sdr1 3984228 164624 3819604 5% /mnt/sdr 164273 /mnt/sdr
>> /dev/sdr1 3984228 246664 3737564 7% /mnt/sdr 246309 /mnt/sdr
>> /dev/sdr1 3984228 321952 3662276 9% /mnt/sdr 321601 /mnt/sdr
>> /dev/sdr1 3984228 404832 3579396 11% /mnt/sdr 404481 /mnt/sdr
>> /dev/sdr1 3984228 477264 3506964 12% /mnt/sdr 476913 /mnt/sdr
>> /dev/sdr1 3984228 527404 3456824 14% /mnt/sdr 1 /mnt/sdr
>>    < there is quite a long pause here, around about 5 seconds >
>>    < and in the meantime, bonnie has failed with a "no space left on
>> device" >
>> /dev/sdr1 3984228 296 3983932 1% /mnt/sdr 1 /mnt/sdr
>>
>>
>> With the modifications that I made above, the following lines are spat
>> out into the /var/log/kern.log:
>>
>> Apr 19 23:20:44 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
>> Apr 19 23:20:57 tidbit last message repeated 3 times
>> Apr 19 23:21:02 tidbit kernel: code: -503 at fs/reiser4/seal.c:170
>> Apr 19 23:21:02 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
>> Apr 19 23:21:17 tidbit last message repeated 3 times
>> Apr 19 23:21:22 tidbit kernel: code: -503 at fs/reiser4/seal.c:170
>> Apr 19 23:21:22 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
>> Apr 19 23:21:37 tidbit last message repeated 4 times
>> Apr 19 23:21:42 tidbit kernel: code: -503 at fs/reiser4/seal.c:181
>> Apr 19 23:21:42 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
>> Apr 19 23:21:57 tidbit last message repeated 3 times
>> Apr 19 23:22:02 tidbit kernel: code: -503 at
>> fs/reiser4/plugin/file/file.c:791
>> Apr 19 23:22:02 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
>> Apr 19 23:22:32 tidbit last message repeated 7 times
>> Apr 19 23:22:51 tidbit last message repeated 4 times
>> Apr 19 23:22:56 tidbit kernel: code: -503 at
>> fs/reiser4/plugin/file/file.c:791
>> Apr 19 23:22:56 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
>> Apr 19 23:23:12 tidbit last message repeated 4 times
>> Apr 19 23:23:16 tidbit kernel: code: -503 at
>> fs/reiser4/plugin/file/file.c:791
>> Apr 19 23:23:16 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
>> Apr 19 23:23:30 tidbit last message repeated 5 times
>> Apr 19 23:23:30 tidbit kernel: code: -503 at
>> fs/reiser4/plugin/file/file.c:791
>> Apr 19 23:23:30 tidbit kernel: code: -503 at
>> fs/reiser4/plugin/file/file.c:791
>> Apr 19 23:23:30 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
>> Apr 19 23:23:32 tidbit kernel: code: -503 at
>> fs/reiser4/plugin/file/file.c:791
>> Apr 19 23:23:32 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
>> Apr 19 23:23:34 tidbit kernel: code: -503 at fs/reiser4/seal.c:170
>> Apr 19 23:23:34 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
>> Apr 19 23:23:36 tidbit kernel: code: -503 at fs/reiser4/seal.c:170
>> Apr 19 23:23:36 tidbit kernel: code: -2 at fs/reiser4/search.c:1204
>> Apr 19 23:23:37 tidbit kernel: code: -28 at 
>> fs/reiser4/block_alloc.c:319
>
> That looks ok, -28 == -ENOSPC.

Sure, the error code is right. But it should not be saying that it has 
no disk space when it claims to df that there is still 3.5GB free!

Cheers,
Paul

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and bonnie problems
  2004-04-20  8:54                   ` Paul Wagland
@ 2004-04-20 13:37                     ` Nikita Danilov
  2004-04-20 14:51                       ` Alex Zarochentsev
  0 siblings, 1 reply; 20+ messages in thread
From: Nikita Danilov @ 2004-04-20 13:37 UTC (permalink / raw)
  To: Paul Wagland; +Cc: reiserfs-list, Markus TXrnqvist

Paul Wagland writes:
 > 
 > On Apr 20, 2004, at 9:51, Nikita Danilov wrote:
 > 
 > > Paul Wagland writes:
 > >> Bonnie: drastic I/O error (write(2)): No space left on device
 > >
 > > Err.. this is completely different from what we had previously: "No
 > > space left on device" is understandable where not reasonable. What is
 > > completely mysterious is "No such file or directory" error that you
 > > reported before.
 > 
 > Yes, last night I could not get that "No such file or directory" error 
 > to appear. I will try to reproduce that particular error again later 
 > tonight. However, this other problem happens at exactly the same time, 
 > and is an error in the reiser4.
 > 
 > Just to summarise everything that is written below, this is what bonnie 
 > is doing:
 > 1. write 3.5 GB onto a 4GB partition. This works
 > 2. delete 3.5 GB from a 4GB partition. This works.

Disk blocks freed during transaction are not actually freed until
transaction commits.

 > 3. I check using df that the disk space is free. That works.

But that "delayed freeing" confused users (they did cp, rm, but df has
still showed that space is used), so that statfs(2) (system call used by
df) was modified to take these delayed blocks into account and pretend
that they are free.

 > 4. write 3.5GB onto a 4GB partition. This fails.

Try to repeat this with sync before step 4.

 > 
 > At the time of failure, according to df there is stiff 3.5 GB
 > free. So, I say that reiser4 has a problem.

df lies.

 > 

[...]

 > 
 > Cheers,
 > Paul

Nikita.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and bonnie problems
  2004-04-20 13:37                     ` Nikita Danilov
@ 2004-04-20 14:51                       ` Alex Zarochentsev
  2004-04-20 18:18                         ` Paul Wagland
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Zarochentsev @ 2004-04-20 14:51 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Paul Wagland, reiserfs-list

On Tue, Apr 20, 2004 at 05:37:39PM +0400, Nikita Danilov wrote:
> Paul Wagland writes:
>  > 
>  > On Apr 20, 2004, at 9:51, Nikita Danilov wrote:
>  > 
>  > > Paul Wagland writes:
>  > >> Bonnie: drastic I/O error (write(2)): No space left on device
>  > >
>  > > Err.. this is completely different from what we had previously: "No
>  > > space left on device" is understandable where not reasonable. What is
>  > > completely mysterious is "No such file or directory" error that you
>  > > reported before.
>  > 
>  > Yes, last night I could not get that "No such file or directory" error 
>  > to appear. I will try to reproduce that particular error again later 
>  > tonight. However, this other problem happens at exactly the same time, 
>  > and is an error in the reiser4.
>  > 
>  > Just to summarise everything that is written below, this is what bonnie 
>  > is doing:
>  > 1. write 3.5 GB onto a 4GB partition. This works
>  > 2. delete 3.5 GB from a 4GB partition. This works.
> 
> Disk blocks freed during transaction are not actually freed until
> transaction commits.
> 
>  > 3. I check using df that the disk space is free. That works.
> 
> But that "delayed freeing" confused users (they did cp, rm, but df has
> still showed that space is used), so that statfs(2) (system call used by
> df) was modified to take these delayed blocks into account and pretend
> that they are free.
> 
>  > 4. write 3.5GB onto a 4GB partition. This fails.
> 
> Try to repeat this with sync before step 4.
> 
>  > 
>  > At the time of failure, according to df there is stiff 3.5 GB
>  > free. So, I say that reiser4 has a problem.
> 
> df lies.

reiser4_statfs() was changed to report deleted blocks as free space immediately
after rm(1).  

It was done because reiser4_write() should trigger fs commit and recover free
space.   If commit does not happen, it is a reiser4 bug.

>  > Paul
> 
> Nikita.

-- 
Alex.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and bonnie problems
  2004-04-20 14:51                       ` Alex Zarochentsev
@ 2004-04-20 18:18                         ` Paul Wagland
  2004-04-21  7:03                           ` Alex Zarochentsev
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Wagland @ 2004-04-20 18:18 UTC (permalink / raw)
  To: Alex Zarochentsev; +Cc: reiserfs-list, Nikita Danilov

[-- Attachment #1: Type: text/plain, Size: 4015 bytes --]


On Apr 20, 2004, at 16:51, Alex Zarochentsev wrote:

> On Tue, Apr 20, 2004 at 05:37:39PM +0400, Nikita Danilov wrote:
>> Paul Wagland writes:
>>> Just to summarise everything that is written below, this is what 
>>> bonnie
>>> is doing:
>>> 1. write 3.5 GB onto a 4GB partition. This works
>>> 2. delete 3.5 GB from a 4GB partition. This works.
>>
>> Disk blocks freed during transaction are not actually freed until
>> transaction commits.

Under what conditions do transactions get committed? Alex mentioned 
below that every write is an implicit commit. Is that the only 
situation? Other than sync obviously :-)

>>> 3. I check using df that the disk space is free. That works.
>>
>> But that "delayed freeing" confused users (they did cp, rm, but df has
>> still showed that space is used), so that statfs(2) (system call used 
>> by
>> df) was modified to take these delayed blocks into account and pretend
>> that they are free.

OK, that I can deal with, rm'ing a file should free the space ;-). 
However, if the transaction is not committed at this point, what 
happens if I lose power at this point? Is the filesystem rolled back to 
before the deletions?

>>> 4. write 3.5GB onto a 4GB partition. This fails.
>>
>> Try to repeat this with sync before step 4.

OK, here is the results of the test. I have decided to run it without 
bonnie, just to make sure that it was not the determining factor.

-----------------

tidbit:~# mount | grep /mnt/sdr
/dev/sdr1 on /mnt/sdr type reiser4 (rw)
tidbit:~# df /mnt/sdr
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdr1              3984228       292   3983936   1% /mnt/sdr
tidbit:~# dd if=/dev/zero of=/mnt/sdr/ddtest bs=512K count=7K ; rm 
/mnt/sdr/ddtest; df /mnt/sdr; dd if=/dev/zero of=/mnt/sdr/ddtest 
bs=512K count=7K ; rm /mnt/sdr/ddtest; df /mnt/sdr
7168+0 records in
7168+0 records out
3758096384 bytes transferred in 70.899981 seconds (53005605 bytes/sec)
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdr1              3984228       292   3983936   1% /mnt/sdr
dd: writing `/mnt/sdr/ddtest': No space left on device
613+0 records in
612+0 records out
321384448 bytes transferred in 3.378787 seconds (95118291 bytes/sec)
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdr1              3984228       296   3983932   1% /mnt/sdr
tidbit:~# dd if=/dev/zero of=/mnt/sdr/ddtest bs=512K count=7K ; rm 
/mnt/sdr/ddtest; df /mnt/sdr; sync; dd if=/dev/zero of=/mnt/sdr/ddtest 
bs=512K count=7K ; rm /mnt/sdr/ddtest; df /mnt/sdr
7168+0 records in
7168+0 records out
3758096384 bytes transferred in 73.244216 seconds (51309122 bytes/sec)
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdr1              3984228       296   3983932   1% /mnt/sdr
7168+0 records in
7168+0 records out
3758096384 bytes transferred in 70.666456 seconds (53180768 bytes/sec)
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdr1              3984228       292   3983936   1% /mnt/sdr

----------------


>>> At the time of failure, according to df there is stiff 3.5 GB
>>> free. So, I say that reiser4 has a problem.
>>
>> df lies.

No. df tells the user what reiser4 tells df. If df is lying it is 
because reiser4 has lied to it. If df tells me that there is 3.9GB 
available on the filesystem, then I expect that filesystem to allow me 
to write 3.9GB to it.

> reiser4_statfs() was changed to report deleted blocks as free space 
> immediately
> after rm(1).

As mentioned above, this makes perfect sense, and leads to more 
'intuitive' behaviour from the filesystem. I fully expect that the 
filesystem should change "established semantics", and in this sense the 
above change keeps these semantics, which is a good thing :-)

> It was done because reiser4_write() should trigger fs commit and 
> recover free
> space.   If commit does not happen, it is a reiser4 bug.

In that case I humbly submit that I have found a reiser4 bug :-)

Cheers,
Paul

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 186 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: reiser4 and bonnie problems
  2004-04-20 18:18                         ` Paul Wagland
@ 2004-04-21  7:03                           ` Alex Zarochentsev
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Zarochentsev @ 2004-04-21  7:03 UTC (permalink / raw)
  To: Paul Wagland; +Cc: reiserfs-list

On Tue, Apr 20, 2004 at 08:18:27PM +0200, Paul Wagland wrote:
> 
> On Apr 20, 2004, at 16:51, Alex Zarochentsev wrote:
> 
> >On Tue, Apr 20, 2004 at 05:37:39PM +0400, Nikita Danilov wrote:
> >>Paul Wagland writes:
> >>>Just to summarise everything that is written below, this is what 
> >>>bonnie
> >>>is doing:
> >>>1. write 3.5 GB onto a 4GB partition. This works
> >>>2. delete 3.5 GB from a 4GB partition. This works.
> >>
> >>Disk blocks freed during transaction are not actually freed until
> >>transaction commits.
> 
> Under what conditions do transactions get committed? Alex mentioned 
> below that every write is an implicit commit. 

no. write should cause a commit _only_ if no free space.

df (for reiser4) does not show correct free block counter value.  It shows (1)
amount of reiser4 free blocks plus (2) blocks which can be freed by atom
commits.  Those blocks are _potentially_ free.  If blocks (1) are not enough,
reiser4 has to commit atoms to free blocks (2).

The explanation above is simplified a bit.  Indeed, atom commit may free more
blocks, some blocks are reserved for wandered log and they are freed after
commit, some blocks can be freed by squalloc (node squeeze and allocate)
operation which precedes atom commit. 

> Is that the only 
> situation? Other than sync obviously :-)

1. atom (transaction) is too old or too large.

2. VM asks for memory and reiser4 failed to free memory ways other than
atom commit.

3. fsync.

4. reiser4 consideres the situation as close to OOM.  reiser4_writepage() may
   force atoms to commit.

> 
> >>>3. I check using df that the disk space is free. That works.
> >>
> >>But that "delayed freeing" confused users (they did cp, rm, but df has
> >>still showed that space is used), so that statfs(2) (system call used 
> >>by
> >>df) was modified to take these delayed blocks into account and pretend
> >>that they are free.
> 
> OK, that I can deal with, rm'ing a file should free the space ;-). 
> However, if the transaction is not committed at this point, what 
> happens if I lose power at this point? Is the filesystem rolled back to 
> before the deletions?
> 
> >>>4. write 3.5GB onto a 4GB partition. This fails.
> >>
> >>Try to repeat this with sync before step 4.
> 
> OK, here is the results of the test. I have decided to run it without 
> bonnie, just to make sure that it was not the determining factor.
> 
> -----------------
> 
> tidbit:~# mount | grep /mnt/sdr
> /dev/sdr1 on /mnt/sdr type reiser4 (rw)
> tidbit:~# df /mnt/sdr
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/sdr1              3984228       292   3983936   1% /mnt/sdr
> tidbit:~# dd if=/dev/zero of=/mnt/sdr/ddtest bs=512K count=7K ; rm 
> /mnt/sdr/ddtest; df /mnt/sdr; dd if=/dev/zero of=/mnt/sdr/ddtest 
> bs=512K count=7K ; rm /mnt/sdr/ddtest; df /mnt/sdr
> 7168+0 records in
> 7168+0 records out
> 3758096384 bytes transferred in 70.899981 seconds (53005605 bytes/sec)
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/sdr1              3984228       292   3983936   1% /mnt/sdr
> dd: writing `/mnt/sdr/ddtest': No space left on device
> 613+0 records in
> 612+0 records out
> 321384448 bytes transferred in 3.378787 seconds (95118291 bytes/sec)
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/sdr1              3984228       296   3983932   1% /mnt/sdr
> tidbit:~# dd if=/dev/zero of=/mnt/sdr/ddtest bs=512K count=7K ; rm 
> /mnt/sdr/ddtest; df /mnt/sdr; sync; dd if=/dev/zero of=/mnt/sdr/ddtest 
> bs=512K count=7K ; rm /mnt/sdr/ddtest; df /mnt/sdr
> 7168+0 records in
> 7168+0 records out
> 3758096384 bytes transferred in 73.244216 seconds (51309122 bytes/sec)
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/sdr1              3984228       296   3983932   1% /mnt/sdr
> 7168+0 records in
> 7168+0 records out
> 3758096384 bytes transferred in 70.666456 seconds (53180768 bytes/sec)
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/sdr1              3984228       292   3983936   1% /mnt/sdr
> 
> ----------------
> 
> 
> >>>At the time of failure, according to df there is stiff 3.5 GB
> >>>free. So, I say that reiser4 has a problem.
> >>
> >>df lies.
> 
> No. df tells the user what reiser4 tells df. If df is lying it is 
> because reiser4 has lied to it. If df tells me that there is 3.9GB 
> available on the filesystem, then I expect that filesystem to allow me 
> to write 3.9GB to it.

Yes. reiser4_stafs() was changed expecting that reiser4_filewrite(), for
example, would free that space if necessary.

I committed a fix for reiser4_write().  However it is not tested yet.

> >reiser4_statfs() was changed to report deleted blocks as free space 
> >immediately
> >after rm(1).
> 
> As mentioned above, this makes perfect sense, and leads to more 
> 'intuitive' behaviour from the filesystem. I fully expect that the 
> filesystem should change "established semantics", and in this sense the 
> above change keeps these semantics, which is a good thing :-)
> 
> >It was done because reiser4_write() should trigger fs commit and 
> >recover free
> >space.   If commit does not happen, it is a reiser4 bug.
> 
> In that case I humbly submit that I have found a reiser4 bug :-)
> 
> Cheers,
> Paul

Thanks.

-- 
Alex.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2004-04-21  7:03 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-14  6:51 reiser4 and megaraid problems with debian 2.6.5 Paul Wagland
2004-04-14  9:05 ` Domenico Andreoli
2004-04-14 12:36   ` Paul Wagland
2004-04-14 13:09     ` Nikita Danilov
2004-04-14 13:25       ` Paul Wagland
2004-04-14 13:45         ` Vladimir Saveliev
2004-04-14 14:03           ` Paul Wagland
2004-04-14 23:59         ` Paul Wagland
2004-04-16 20:39           ` mjt
2004-04-17  7:38             ` Paul Wagland
2004-04-19 21:40               ` reiser4 and bonnie problems Paul Wagland
2004-04-20  7:51                 ` Nikita Danilov
2004-04-20  8:54                   ` Paul Wagland
2004-04-20 13:37                     ` Nikita Danilov
2004-04-20 14:51                       ` Alex Zarochentsev
2004-04-20 18:18                         ` Paul Wagland
2004-04-21  7:03                           ` Alex Zarochentsev
2004-04-18 22:36           ` reiser4 and megaraid problems with debian 2.6.5 (*solved*) Paul Wagland
2004-04-14 15:13 ` reiser4 and megaraid problems with debian 2.6.5 Hans Reiser
2004-04-14 15:37   ` Paul Wagland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.