* reiser4 and megaraid problems with debian 2.6.5 @ 2004-04-14 6:51 Paul Wagland 2004-04-14 9:05 ` Domenico Andreoli 2004-04-14 15:13 ` reiser4 and megaraid problems with debian 2.6.5 Hans Reiser 0 siblings, 2 replies; 20+ messages in thread From: Paul Wagland @ 2004-04-14 6:51 UTC (permalink / raw) To: Linux mailing list SCSI, Linux mailing list kernel Cc: Hans Reiser, Atul Mukker [-- Attachment #1: Type: text/plain, Size: 1406 bytes --] Hi all, I would like to report on a problem that I am having. I am just testing out the new megaraid unified driver, and have been doing some baseline testing with bonnie++. My problem is that, although reiserfs, ext2, jfs and xfs all work, reiser4 fails with the following error: --- Can't write block. Bonnie: drastic I/O error (write(2)): No such file or directory --- I am using the debian prepared kernel with the debian reiser4 patch. I made a cursory examination of the patch, and it appears to correlate fairly closely with the patch from the namesys site. Given that this works with reiserfs, ext2, jfs and xfs it would appear to be a reiser4 problem, however ext3 also fails, though with a different error, it claims that the disk is full, but it is trying to write a 2 1GB files onto a 2.5GB filesystem, so it should have enough room, and indeed it did even work two or three times out of about 10 runs (lots of timing :-). This implies that it might be a megaraid problem. As you can tell, I really have no idea ;-) I will try playing around tonight with an official kernel and the official reiser4 patch to see if that makes any difference, but would just like to raise this potential problem sooner rather than later. If I can help debug this situation (I am probably the only person trying this combination :-) please let me know how I should go about it. Cheers, Paul [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and megaraid problems with debian 2.6.5 2004-04-14 6:51 reiser4 and megaraid problems with debian 2.6.5 Paul Wagland @ 2004-04-14 9:05 ` Domenico Andreoli 2004-04-14 12:36 ` Paul Wagland 2004-04-14 15:13 ` reiser4 and megaraid problems with debian 2.6.5 Hans Reiser 1 sibling, 1 reply; 20+ messages in thread From: Domenico Andreoli @ 2004-04-14 9:05 UTC (permalink / raw) To: Paul Wagland Cc: Linux mailing list SCSI, Linux mailing list kernel, Hans Reiser, Atul Mukker, reiserfs-list [ bringing this also on reiserfs ml, a great place for this kind of posts. this is also the reason of the full quoting. sorry ] On Wed, Apr 14, 2004 at 08:51:53AM +0200, Paul Wagland wrote: > Hi all, hi Paul, > I would like to report on a problem that I am having. I am just testing > out the new megaraid unified driver, and have been doing some baseline > testing with bonnie++. > > My problem is that, although reiserfs, ext2, jfs and xfs all work, > reiser4 fails with the following error: > --- > Can't write block. > Bonnie: drastic I/O error (write(2)): No such file or directory > --- > > I am using the debian prepared kernel with the debian reiser4 patch. I > made a cursory examination of the patch, and it appears to correlate > fairly closely with the patch from the namesys site. of course it is correlated to that of namesys! i have no skills at all to invent reiser4 :)) you forgot to specify version of the patch you are talking about, currently debian provides two versions. anyway i suppose you are talking about version 20040326-2, aren't you? > Given that this works with reiserfs, ext2, jfs and xfs it would appear > to be a reiser4 problem, however ext3 also fails, though with a > different error, it claims that the disk is full, but it is trying to > write a 2 1GB files onto a 2.5GB filesystem, so it should have enough > room, and indeed it did even work two or three times out of about 10 > runs (lots of timing :-). This implies that it might be a megaraid > problem. As you can tell, I really have no idea ;-) > > I will try playing around tonight with an official kernel and the > official reiser4 patch to see if that makes any difference, but would > just like to raise this potential problem sooner rather than later. latest reiser4 snapshot provided a patch which applied cleanly on 2.6.5-rc2 but not to 2.6.5. i had to modify it as suggested on the reiserfs ml. if you look at the debian package's changelog you can find the reference to that thread. > If I can help debug this situation (I am probably the only person > trying this combination :-) please let me know how I should go about > it. i'm sorry but i can't help further. cheers domenico -----[ Domenico Andreoli, aka cavok --[ http://filibusta.crema.unimi.it/~cavok/gpgkey.asc ---[ 3A0F 2F80 F79C 678A 8936 4FEE 0677 9033 A20E BC50 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and megaraid problems with debian 2.6.5 2004-04-14 9:05 ` Domenico Andreoli @ 2004-04-14 12:36 ` Paul Wagland 2004-04-14 13:09 ` Nikita Danilov 0 siblings, 1 reply; 20+ messages in thread From: Paul Wagland @ 2004-04-14 12:36 UTC (permalink / raw) To: Domenico Andreoli Cc: reiserfs-list, Linux mailing list SCSI, Atul Mukker, Hans Reiser, Linux mailing list kernel [-- Attachment #1: Type: text/plain, Size: 1440 bytes --] On Apr 14, 2004, at 11:05, Domenico Andreoli wrote: > [ bringing this also on reiserfs ml, a great place for this kind > of posts. this is also the reason of the full quoting. sorry ] Thanks ;-) >> I am using the debian prepared kernel with the debian reiser4 patch. I >> made a cursory examination of the patch, and it appears to correlate >> fairly closely with the patch from the namesys site. > > you forgot to specify version of the patch you are talking about, > currently debian provides two versions. anyway i suppose you are > talking > about version 20040326-2, aren't you? Yes, that is correct. >> If I can help debug this situation (I am probably the only person >> trying this combination :-) please let me know how I should go about >> it. > > i'm sorry but i can't help further. Thanks for the tip... the link that you referred to was most useful. I might now have an idea what the problem might be... Further on in the thread <http://marc.theaimsgroup.com/?l=reiserfs&m=108117079808733&w=2> it says that there is something in the patch that "can lead to a dirtied_when in the future, and missed writeback". Well, what happens if the directory that I am missing was in that writeback that got missed? I will try updating the debian patch myself and give it another test tonight and will report back on my findings. But, before I do so, does it seem likely that this could cause the problem? Cheers, Paul [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and megaraid problems with debian 2.6.5 2004-04-14 12:36 ` Paul Wagland @ 2004-04-14 13:09 ` Nikita Danilov 2004-04-14 13:25 ` Paul Wagland 0 siblings, 1 reply; 20+ messages in thread From: Nikita Danilov @ 2004-04-14 13:09 UTC (permalink / raw) To: Paul Wagland Cc: Domenico Andreoli, reiserfs-list, Linux mailing list SCSI, Atul Mukker, Hans Reiser, Linux mailing list kernel Paul Wagland writes: > > On Apr 14, 2004, at 11:05, Domenico Andreoli wrote: > > > [ bringing this also on reiserfs ml, a great place for this kind > > of posts. this is also the reason of the full quoting. sorry ] > > Thanks ;-) > > >> I am using the debian prepared kernel with the debian reiser4 patch. I > >> made a cursory examination of the patch, and it appears to correlate > >> fairly closely with the patch from the namesys site. > > > > you forgot to specify version of the patch you are talking about, > > currently debian provides two versions. anyway i suppose you are > > talking > > about version 20040326-2, aren't you? > > Yes, that is correct. > > >> If I can help debug this situation (I am probably the only person > >> trying this combination :-) please let me know how I should go about > >> it. Is there anything in the logs? [...] > > Cheers, > Paul Nikita. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and megaraid problems with debian 2.6.5 2004-04-14 13:09 ` Nikita Danilov @ 2004-04-14 13:25 ` Paul Wagland 2004-04-14 13:45 ` Vladimir Saveliev 2004-04-14 23:59 ` Paul Wagland 0 siblings, 2 replies; 20+ messages in thread From: Paul Wagland @ 2004-04-14 13:25 UTC (permalink / raw) To: Nikita Danilov Cc: reiserfs-list, Linux mailing list SCSI, Atul Mukker, Domenico Andreoli, Hans Reiser, Linux mailing list kernel [-- Attachment #1: Type: text/plain, Size: 416 bytes --] On Apr 14, 2004, at 15:09, Nikita Danilov wrote: >>> Paul Wagland writes: >>>> If I can help debug this situation (I am probably the only person >>>> trying this combination :-) please let me know how I should go about >>>> it. > > Is there anything in the logs? Sadly I forgot to check... though I will check again tonight since the problem is quite reproducible for me. Will report back later... Cheers, Paul [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and megaraid problems with debian 2.6.5 2004-04-14 13:25 ` Paul Wagland @ 2004-04-14 13:45 ` Vladimir Saveliev 2004-04-14 14:03 ` Paul Wagland 2004-04-14 23:59 ` Paul Wagland 1 sibling, 1 reply; 20+ messages in thread From: Vladimir Saveliev @ 2004-04-14 13:45 UTC (permalink / raw) To: Paul Wagland; +Cc: reiserfs-list Hello On Wed, 2004-04-14 at 17:25, Paul Wagland wrote: > On Apr 14, 2004, at 15:09, Nikita Danilov wrote: > > >>> Paul Wagland writes: > >>>> If I can help debug this situation (I am probably the only person > >>>> trying this combination :-) please let me know how I should go about > >>>> it. > > > > Is there anything in the logs? > > Sadly I forgot to check... though I will check again tonight since the > problem is quite reproducible for me. Will report Would you also mind to try on another device, please? > Cheers, > Paul ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and megaraid problems with debian 2.6.5 2004-04-14 13:45 ` Vladimir Saveliev @ 2004-04-14 14:03 ` Paul Wagland 0 siblings, 0 replies; 20+ messages in thread From: Paul Wagland @ 2004-04-14 14:03 UTC (permalink / raw) To: Vladimir Saveliev; +Cc: reiserfs-list [-- Attachment #1: Type: text/plain, Size: 781 bytes --] Hi Vladimir, On Apr 14, 2004, at 15:45, Vladimir Saveliev wrote: > On Wed, 2004-04-14 at 17:25, Paul Wagland wrote: >> On Apr 14, 2004, at 15:09, Nikita Danilov wrote: >>> >>> Is there anything in the logs? >> >> Sadly I forgot to check... though I will check again tonight since the >> problem is quite reproducible for me. Will report > > Would you also mind to try on another device, please? Sadly, I can't, since I only have two machines that I can play with, one which is quite averse to 2.6 (not sure why, but it is going away soon so I haven't looked into it) and the one with the megaraid device, and this machine only has disks exported through that card. I can try it with the stock 2.6.5 driver, and will do so if the "Jiffies|1" fix doesn't help. Cheers, Paul [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and megaraid problems with debian 2.6.5 2004-04-14 13:25 ` Paul Wagland 2004-04-14 13:45 ` Vladimir Saveliev @ 2004-04-14 23:59 ` Paul Wagland 2004-04-16 20:39 ` mjt 2004-04-18 22:36 ` reiser4 and megaraid problems with debian 2.6.5 (*solved*) Paul Wagland 1 sibling, 2 replies; 20+ messages in thread From: Paul Wagland @ 2004-04-14 23:59 UTC (permalink / raw) To: Nikita Danilov Cc: reiserfs-list, Linux SCSI mailing list, Atul Mukker, Domenico Andreoli, Hans Reiser, Linux kernel mailing list On Wed, 2004-04-14 at 15:25, Paul Wagland wrote: > On Apr 14, 2004, at 15:09, Nikita Danilov wrote: > > >>> Paul Wagland writes: > >>>> If I can help debug this situation (I am probably the only person > >>>> trying this combination :-) please let me know how I should go about > >>>> it. > > > > Is there anything in the logs? > > Sadly I forgot to check... though I will check again tonight since the > problem is quite reproducible for me. Will report back later... OK. There is nothing in the logs. I have recompiled the kernel with extra REISER4 debugging and checking and still nothing. This error is 100% reproducible for me. I have had a thought, what if it is "only" the wrong error code that is being returned? What if the real problem is that we are running out of free blocks. To test this theory (a little at least) I ran: # bonnie++ -q -x4 -d /mnt/sdq -u 0:0 -f -r500 name,file_size,putc,putc_cpu,put_block,put_block_cpu,rewrite,rewrite_cpu,getc,getc_cpu,get_block,get_block_cpu,seeks,seeks_cpu,num_files,seq_create,seq_create_cpu,seq_stat,seq_stat_cpu,seq_del,seq_del_cpu,ran_create,ran_create_cpu,ran_stat,ran_stat_cpu,ran_del,ran_del_cpu tidbit.kungfoocoder.org,1G,,,55236,11,36165,10,,,73514,8,2138.3,2,16,+++++,+++,+++++,+++,25015,99,28712,100,+++++,+++,26846,100 tidbit.kungfoocoder.org,1G,,,55236,11,30073,8,,,84287,10,2046.9,2,16,+++++,+++,+++++,+++,24862,99,28340,99,+++++,+++,26490,99 tidbit.kungfoocoder.org,1G,,,55391,11,30140,9,,,84506,10,2050.2,2,16,+++++,+++,+++++,+++,24642,100,28725,100,+++++,+++,26653,100 tidbit.kungfoocoder.org,1G,,,55364,11,30165,8,,,83055,11,2051.9,2,16,+++++,+++,+++++,+++,24682,100,28264,100,+++++,+++,26804,99 Note that even with debugging turned on we are about 5% faster at reading and 20% slower than writing compared to reiserfs. Pretty good I dare say. However, when I run: ~# bonnie++ -x4 -d /mnt/sdq -u 0:0 -f -q -r800 name,file_size,putc,putc_cpu,put_block,put_block_cpu,rewrite,rewrite_cpu,getc,getc_cpu,get_block,get_block_cpu,seeks,seeks_cpu,num_files,seq_create,seq_create_cpu,seq_stat,seq_stat_cpu,seq_del,seq_del_cpu,ran_create,ran_create_cpu,ran_stat,ran_stat_cpu,ran_del,ran_del_cpu Can't write block. Bonnie: drastic I/O error (re write(2)): No such file or directory Using reiserfs I can happily run: # bonnie++ -x4 -d /mnt/sdq -u 0:0 -f -q -r1008 and the partition is 2.5GB in size. Some more background information: my hardware is not overclocked, and has been 100% reliable, about two weeks ago I sat it through about 24 hours of memtest86+ without any problems. The machine has 1GB of RAM. The logical partition that I am testing is 2.5Gb Here are the REISER4 settings from my configuration: tidbit:~# grep REISER4 /boot/config-2.6.5pw-newmega-k7-1 CONFIG_REISER4_FS=m # CONFIG_REISER4_FS_SYSCALL is not set CONFIG_REISER4_LARGE_KEY=y CONFIG_REISER4_CHECK=y CONFIG_REISER4_FS_SYSCALL_DEBUG=y # CONFIG_REISER4_DEBUG_MODIFY is not set # CONFIG_REISER4_DEBUG_MEMCPY is not set # CONFIG_REISER4_DEBUG_NODE is not set # CONFIG_REISER4_ZERO_NEW_NODE is not set # CONFIG_REISER4_TRACE is not set # CONFIG_REISER4_EVENT_LOG is not set # CONFIG_REISER4_STATS is not set # CONFIG_REISER4_PROF is not set # CONFIG_REISER4_LOCKPROF is not set # CONFIG_REISER4_DEBUG_OUTPUT is not set # CONFIG_REISER4_NOOPT is not set CONFIG_REISER4_USE_EFLUSH=y # CONFIG_REISER4_COPY_ON_CAPTURE is not set # CONFIG_REISER4_BADBLOCKS is not set I have removed the |1 from the jiffies|1 assignment. It still works, which means that the kernel must have been fixed :-) But it didn't help :-\ Hope this helps provide some illumination to the gurus out there... Cheers, Paul ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and megaraid problems with debian 2.6.5 2004-04-14 23:59 ` Paul Wagland @ 2004-04-16 20:39 ` mjt 2004-04-17 7:38 ` Paul Wagland 2004-04-18 22:36 ` reiser4 and megaraid problems with debian 2.6.5 (*solved*) Paul Wagland 1 sibling, 1 reply; 20+ messages in thread From: mjt @ 2004-04-16 20:39 UTC (permalink / raw) To: reiserfs-list Paul Wagland wrote: >Bonnie: drastic I/O error (re write(2)): No such file or directory I applied http://mjt.nysv.org/reiser/bonnie.patch which may or may not help. Nikita gave it on IRC, but without either of us having time at the moment to do much about it (so it'll wait until Monday, at least). Guess the point of this email is that I won't forget the results I got by then :) Nothing was logged, although I had the following on in the kernel: mjt@shrike:~$ zgrep REISER4 /proc/config.gz CONFIG_REISER4_FS=y # CONFIG_REISER4_FS_SYSCALL is not set CONFIG_REISER4_LARGE_KEY=y CONFIG_REISER4_CHECK=y CONFIG_REISER4_DEBUG=y CONFIG_REISER4_FS_SYSCALL_DEBUG=y # CONFIG_REISER4_DEBUG_MODIFY is not set # CONFIG_REISER4_DEBUG_MEMCPY is not set # CONFIG_REISER4_DEBUG_NODE is not set # CONFIG_REISER4_ZERO_NEW_NODE is not set # CONFIG_REISER4_TRACE is not set # CONFIG_REISER4_EVENT_LOG is not set # CONFIG_REISER4_STATS is not set # CONFIG_REISER4_PROF is not set # CONFIG_REISER4_LOCKPROF is not set CONFIG_REISER4_DEBUG_OUTPUT=y # CONFIG_REISER4_NOOPT is not set CONFIG_REISER4_USE_EFLUSH=y # CONFIG_REISER4_COPY_ON_CAPTURE is not set # CONFIG_REISER4_BADBLOCKS is not set I ran while [ 0 ]; do set $( df -m | tail -1); echo $4; sleep 1; done on one terminal and time /usr/sbin/bonnie++ -d bonnie/ -f -s $[ $( set $( df -m | tail -1); echo $4 ) - 128 ] on another. The point is that df -m | tail -1 returns my home directory and 128 should be how much space bonnie++ should leave for me while testing. This is the output: mjt@shrike:~/tmp$ time /usr/sbin/bonnie++ -d bonnie/ -f -s $[ $( set $( df -m | tail -1); echo $4 ) - 128 ] Writing intelligently... Message from syslogd@shrike at Fri Apr 16 19:33:14 2004 ... shrike kernel: Disabling IRQ #10 done Rewriting...Can't write block. Bonnie: drastic I/O error (re write(2)): No such file or directory real 39m19.082s user 0m2.990s sys 28m46.480s And this is what the while [ 0 ] loop gave me: 82 63 43 126 123 110 90 127 53185 53185 This is naturally partial, there are many entries for almost fourty minutes, which I did not log. As you see, the free space drops well below 128, before the error when the space is freed again. Is this because bonnie++ does not take -s too literally or what? Also, the patch seemed to have no effect and I don't think I dare have it in for nothing, not without talking to Nik or someone about it. So, if this is a question of bonnie++ being imprecise, it's ok, but Paul's problem is on a much larger scale. Hopefully we will talk about this during the weekend on IRC, but this message should suffice as a partial status report. -- mjt ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and megaraid problems with debian 2.6.5 2004-04-16 20:39 ` mjt @ 2004-04-17 7:38 ` Paul Wagland 2004-04-19 21:40 ` reiser4 and bonnie problems Paul Wagland 0 siblings, 1 reply; 20+ messages in thread From: Paul Wagland @ 2004-04-17 7:38 UTC (permalink / raw) To: Markus TXrnqvist; +Cc: reiserfs-list [-- Attachment #1: Type: text/plain, Size: 1846 bytes --] On Apr 16, 2004, at 22:39, Markus TXrnqvist wrote: > Paul Wagland wrote: > >> Bonnie: drastic I/O error (re write(2)): No such file or directory > > I applied http://mjt.nysv.org/reiser/bonnie.patch which may or may > not help. Nikita gave it on IRC, but without either of us having > time at the moment to do much about it (so it'll wait until Monday, > at least). > > Guess the point of this email is that I won't forget the results I > got by then :) As such, I will also add my results to this thread... please note that I am not with my main machine at the moment, so can't post the numbers :-\ Anyway, I also have applied the above patch, and rerun my bonnie test, but this time on a 4GB partition. I am still getting failures, but nothing is being printed in dmesg or /var/log/kern.log What happens is that the character output works, and creates four files, 3x1Gb and 1x512MB. In another window I have a df loop running, and I can see that there is about 495MB free. So far so good :-) Now the "intelligent writing" starts, and after about 10 seconds it fails with the "drastic IO error...no such file or directory" error, looking at the df window I can see that when it fails there is 3.5GB free on that disk, I.e. about 480MB had been used. Since it is a one second loop in both cases then I am imagining that bonnie was able to write out 4GB of data. What I think is happening in this case is that reiser4 is that although the old files have been deleted, and the online disk maps have been updated to show that there is enough free disk, the logs used to create those original files have not been freed. Does that make sense? Maybe it might help some of the more knowledgeable in determining what is happening? Let me know if I can run any further tests to further determine what is happening. Cheers, Paul [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* reiser4 and bonnie problems 2004-04-17 7:38 ` Paul Wagland @ 2004-04-19 21:40 ` Paul Wagland 2004-04-20 7:51 ` Nikita Danilov 0 siblings, 1 reply; 20+ messages in thread From: Paul Wagland @ 2004-04-19 21:40 UTC (permalink / raw) To: Paul Wagland; +Cc: reiserfs-list, Markus TXrnqvist [-- Attachment #1: Type: text/plain, Size: 4759 bytes --] On Apr 17, 2004, at 9:38, Paul Wagland wrote: > > On Apr 16, 2004, at 22:39, Markus TXrnqvist wrote: > >> Paul Wagland wrote: >> >>> Bonnie: drastic I/O error (re write(2)): No such file or directory >> >> I applied http://mjt.nysv.org/reiser/bonnie.patch which may or may >> not help. Nikita gave it on IRC, but without either of us having >> time at the moment to do much about it (so it'll wait until Monday, >> at least). OK. I have updated the above patch so that instead of only calling report_err() on ENOENT, it calls report_err whenever an error happens. I then re-ran my test: tidbit:~# df /mnt/sdr Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdr1 3984228 296 3983932 1% /mnt/sdr tidbit:~# bonnie++ -r 1736 -d /mnt/sdr -u 0:0 Using uid:0, gid:0. Writing with putc()...done Writing intelligently...Can't write block. Bonnie: drastic I/O error (write(2)): No space left on device Note, that bonnie tries to create 3.5 GB worth of files on a 4GB filesystem... this should not be causing a problem. Also, the "Writing intelligently" test tries to write out the same amount of information as the "Writing with putc()" test. After each test all the files from that test are deleted, so this really should not be causing a problem. Further, while running the test, I was also running a df loop, as can be seen below: tidbit:~# while true; do sleep 1; echo `df /mnt/sdr | tail -1` `du -s /mnt/sdr`; done /dev/sdr1 3984228 3496804 487424 88% /mnt/sdr 3499521 /mnt/sdr /dev/sdr1 3984228 3523684 460544 89% /mnt/sdr 3523393 /mnt/sdr /dev/sdr1 3984228 3547620 436608 90% /mnt/sdr 3547329 /mnt/sdr /dev/sdr1 3984228 410136 3574092 11% /mnt/sdr 1 /mnt/sdr < more or less here is where the intelligent write starts > /dev/sdr1 3984228 84768 3899460 3% /mnt/sdr 84417 /mnt/sdr /dev/sdr1 3984228 164624 3819604 5% /mnt/sdr 164273 /mnt/sdr /dev/sdr1 3984228 246664 3737564 7% /mnt/sdr 246309 /mnt/sdr /dev/sdr1 3984228 321952 3662276 9% /mnt/sdr 321601 /mnt/sdr /dev/sdr1 3984228 404832 3579396 11% /mnt/sdr 404481 /mnt/sdr /dev/sdr1 3984228 477264 3506964 12% /mnt/sdr 476913 /mnt/sdr /dev/sdr1 3984228 527404 3456824 14% /mnt/sdr 1 /mnt/sdr < there is quite a long pause here, around about 5 seconds > < and in the meantime, bonnie has failed with a "no space left on device" > /dev/sdr1 3984228 296 3983932 1% /mnt/sdr 1 /mnt/sdr With the modifications that I made above, the following lines are spat out into the /var/log/kern.log: Apr 19 23:20:44 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 Apr 19 23:20:57 tidbit last message repeated 3 times Apr 19 23:21:02 tidbit kernel: code: -503 at fs/reiser4/seal.c:170 Apr 19 23:21:02 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 Apr 19 23:21:17 tidbit last message repeated 3 times Apr 19 23:21:22 tidbit kernel: code: -503 at fs/reiser4/seal.c:170 Apr 19 23:21:22 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 Apr 19 23:21:37 tidbit last message repeated 4 times Apr 19 23:21:42 tidbit kernel: code: -503 at fs/reiser4/seal.c:181 Apr 19 23:21:42 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 Apr 19 23:21:57 tidbit last message repeated 3 times Apr 19 23:22:02 tidbit kernel: code: -503 at fs/reiser4/plugin/file/file.c:791 Apr 19 23:22:02 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 Apr 19 23:22:32 tidbit last message repeated 7 times Apr 19 23:22:51 tidbit last message repeated 4 times Apr 19 23:22:56 tidbit kernel: code: -503 at fs/reiser4/plugin/file/file.c:791 Apr 19 23:22:56 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 Apr 19 23:23:12 tidbit last message repeated 4 times Apr 19 23:23:16 tidbit kernel: code: -503 at fs/reiser4/plugin/file/file.c:791 Apr 19 23:23:16 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 Apr 19 23:23:30 tidbit last message repeated 5 times Apr 19 23:23:30 tidbit kernel: code: -503 at fs/reiser4/plugin/file/file.c:791 Apr 19 23:23:30 tidbit kernel: code: -503 at fs/reiser4/plugin/file/file.c:791 Apr 19 23:23:30 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 Apr 19 23:23:32 tidbit kernel: code: -503 at fs/reiser4/plugin/file/file.c:791 Apr 19 23:23:32 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 Apr 19 23:23:34 tidbit kernel: code: -503 at fs/reiser4/seal.c:170 Apr 19 23:23:34 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 Apr 19 23:23:36 tidbit kernel: code: -503 at fs/reiser4/seal.c:170 Apr 19 23:23:36 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 Apr 19 23:23:37 tidbit kernel: code: -28 at fs/reiser4/block_alloc.c:319 I really hope that this helps someone out there who knows more about the code internals than I. As always, if you would like me to run some more tests, please let me know! Cheers, Paul [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and bonnie problems 2004-04-19 21:40 ` reiser4 and bonnie problems Paul Wagland @ 2004-04-20 7:51 ` Nikita Danilov 2004-04-20 8:54 ` Paul Wagland 0 siblings, 1 reply; 20+ messages in thread From: Nikita Danilov @ 2004-04-20 7:51 UTC (permalink / raw) To: Paul Wagland; +Cc: reiserfs-list, Markus TXrnqvist Paul Wagland writes: > > On Apr 17, 2004, at 9:38, Paul Wagland wrote: > > > > > On Apr 16, 2004, at 22:39, Markus TXrnqvist wrote: > > > >> Paul Wagland wrote: > >> > >>> Bonnie: drastic I/O error (re write(2)): No such file or directory > >> > >> I applied http://mjt.nysv.org/reiser/bonnie.patch which may or may > >> not help. Nikita gave it on IRC, but without either of us having > >> time at the moment to do much about it (so it'll wait until Monday, > >> at least). > > OK. I have updated the above patch so that instead of only calling > report_err() on ENOENT, it calls report_err whenever an error happens. > I then re-ran my test: > > tidbit:~# df /mnt/sdr > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/sdr1 3984228 296 3983932 1% /mnt/sdr > tidbit:~# bonnie++ -r 1736 -d /mnt/sdr -u 0:0 > Using uid:0, gid:0. > Writing with putc()...done > Writing intelligently...Can't write block. > Bonnie: drastic I/O error (write(2)): No space left on device Err.. this is completely different from what we had previously: "No space left on device" is understandable where not reasonable. What is completely mysterious is "No such file or directory" error that you reported before. > > > Note, that bonnie tries to create 3.5 GB worth of files on a 4GB > filesystem... this should not be causing a problem. Also, the "Writing > intelligently" test tries to write out the same amount of information > as the "Writing with putc()" test. After each test all the files from > that test are deleted, so this really should not be causing a problem. > Further, while running the test, I was also running a df loop, as can > be seen below: > > tidbit:~# while true; do sleep 1; echo `df /mnt/sdr | tail -1` `du -s > /mnt/sdr`; done > /dev/sdr1 3984228 3496804 487424 88% /mnt/sdr 3499521 /mnt/sdr > /dev/sdr1 3984228 3523684 460544 89% /mnt/sdr 3523393 /mnt/sdr > /dev/sdr1 3984228 3547620 436608 90% /mnt/sdr 3547329 /mnt/sdr > /dev/sdr1 3984228 410136 3574092 11% /mnt/sdr 1 /mnt/sdr > < more or less here is where the intelligent write starts > > /dev/sdr1 3984228 84768 3899460 3% /mnt/sdr 84417 /mnt/sdr > /dev/sdr1 3984228 164624 3819604 5% /mnt/sdr 164273 /mnt/sdr > /dev/sdr1 3984228 246664 3737564 7% /mnt/sdr 246309 /mnt/sdr > /dev/sdr1 3984228 321952 3662276 9% /mnt/sdr 321601 /mnt/sdr > /dev/sdr1 3984228 404832 3579396 11% /mnt/sdr 404481 /mnt/sdr > /dev/sdr1 3984228 477264 3506964 12% /mnt/sdr 476913 /mnt/sdr > /dev/sdr1 3984228 527404 3456824 14% /mnt/sdr 1 /mnt/sdr > < there is quite a long pause here, around about 5 seconds > > < and in the meantime, bonnie has failed with a "no space left on > device" > > /dev/sdr1 3984228 296 3983932 1% /mnt/sdr 1 /mnt/sdr > > > With the modifications that I made above, the following lines are spat > out into the /var/log/kern.log: > > Apr 19 23:20:44 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 > Apr 19 23:20:57 tidbit last message repeated 3 times > Apr 19 23:21:02 tidbit kernel: code: -503 at fs/reiser4/seal.c:170 > Apr 19 23:21:02 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 > Apr 19 23:21:17 tidbit last message repeated 3 times > Apr 19 23:21:22 tidbit kernel: code: -503 at fs/reiser4/seal.c:170 > Apr 19 23:21:22 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 > Apr 19 23:21:37 tidbit last message repeated 4 times > Apr 19 23:21:42 tidbit kernel: code: -503 at fs/reiser4/seal.c:181 > Apr 19 23:21:42 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 > Apr 19 23:21:57 tidbit last message repeated 3 times > Apr 19 23:22:02 tidbit kernel: code: -503 at > fs/reiser4/plugin/file/file.c:791 > Apr 19 23:22:02 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 > Apr 19 23:22:32 tidbit last message repeated 7 times > Apr 19 23:22:51 tidbit last message repeated 4 times > Apr 19 23:22:56 tidbit kernel: code: -503 at > fs/reiser4/plugin/file/file.c:791 > Apr 19 23:22:56 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 > Apr 19 23:23:12 tidbit last message repeated 4 times > Apr 19 23:23:16 tidbit kernel: code: -503 at > fs/reiser4/plugin/file/file.c:791 > Apr 19 23:23:16 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 > Apr 19 23:23:30 tidbit last message repeated 5 times > Apr 19 23:23:30 tidbit kernel: code: -503 at > fs/reiser4/plugin/file/file.c:791 > Apr 19 23:23:30 tidbit kernel: code: -503 at > fs/reiser4/plugin/file/file.c:791 > Apr 19 23:23:30 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 > Apr 19 23:23:32 tidbit kernel: code: -503 at > fs/reiser4/plugin/file/file.c:791 > Apr 19 23:23:32 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 > Apr 19 23:23:34 tidbit kernel: code: -503 at fs/reiser4/seal.c:170 > Apr 19 23:23:34 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 > Apr 19 23:23:36 tidbit kernel: code: -503 at fs/reiser4/seal.c:170 > Apr 19 23:23:36 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 > Apr 19 23:23:37 tidbit kernel: code: -28 at fs/reiser4/block_alloc.c:319 That looks ok, -28 == -ENOSPC. > > > I really hope that this helps someone out there who knows more about > the code internals than I. As always, if you would like me to run some > more tests, please let me know! > > Cheers, > Paul Nikita. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and bonnie problems 2004-04-20 7:51 ` Nikita Danilov @ 2004-04-20 8:54 ` Paul Wagland 2004-04-20 13:37 ` Nikita Danilov 0 siblings, 1 reply; 20+ messages in thread From: Paul Wagland @ 2004-04-20 8:54 UTC (permalink / raw) To: Nikita Danilov; +Cc: reiserfs-list, Markus TXrnqvist [-- Attachment #1: Type: text/plain, Size: 5021 bytes --] On Apr 20, 2004, at 9:51, Nikita Danilov wrote: > Paul Wagland writes: >> Bonnie: drastic I/O error (write(2)): No space left on device > > Err.. this is completely different from what we had previously: "No > space left on device" is understandable where not reasonable. What is > completely mysterious is "No such file or directory" error that you > reported before. Yes, last night I could not get that "No such file or directory" error to appear. I will try to reproduce that particular error again later tonight. However, this other problem happens at exactly the same time, and is an error in the reiser4. Just to summarise everything that is written below, this is what bonnie is doing: 1. write 3.5 GB onto a 4GB partition. This works 2. delete 3.5 GB from a 4GB partition. This works. 3. I check using df that the disk space is free. That works. 4. write 3.5GB onto a 4GB partition. This fails. At the time of failure, according to df there is stiff 3.5 GB free. So, I say that reiser4 has a problem. >> Note, that bonnie tries to create 3.5 GB worth of files on a 4GB >> filesystem... this should not be causing a problem. Also, the "Writing >> intelligently" test tries to write out the same amount of information >> as the "Writing with putc()" test. After each test all the files from >> that test are deleted, so this really should not be causing a problem. >> Further, while running the test, I was also running a df loop, as can >> be seen below: >> >> tidbit:~# while true; do sleep 1; echo `df /mnt/sdr | tail -1` `du -s >> /mnt/sdr`; done >> /dev/sdr1 3984228 3496804 487424 88% /mnt/sdr 3499521 /mnt/sdr >> /dev/sdr1 3984228 3523684 460544 89% /mnt/sdr 3523393 /mnt/sdr >> /dev/sdr1 3984228 3547620 436608 90% /mnt/sdr 3547329 /mnt/sdr >> /dev/sdr1 3984228 410136 3574092 11% /mnt/sdr 1 /mnt/sdr >> < more or less here is where the intelligent write starts > >> /dev/sdr1 3984228 84768 3899460 3% /mnt/sdr 84417 /mnt/sdr >> /dev/sdr1 3984228 164624 3819604 5% /mnt/sdr 164273 /mnt/sdr >> /dev/sdr1 3984228 246664 3737564 7% /mnt/sdr 246309 /mnt/sdr >> /dev/sdr1 3984228 321952 3662276 9% /mnt/sdr 321601 /mnt/sdr >> /dev/sdr1 3984228 404832 3579396 11% /mnt/sdr 404481 /mnt/sdr >> /dev/sdr1 3984228 477264 3506964 12% /mnt/sdr 476913 /mnt/sdr >> /dev/sdr1 3984228 527404 3456824 14% /mnt/sdr 1 /mnt/sdr >> < there is quite a long pause here, around about 5 seconds > >> < and in the meantime, bonnie has failed with a "no space left on >> device" > >> /dev/sdr1 3984228 296 3983932 1% /mnt/sdr 1 /mnt/sdr >> >> >> With the modifications that I made above, the following lines are spat >> out into the /var/log/kern.log: >> >> Apr 19 23:20:44 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 >> Apr 19 23:20:57 tidbit last message repeated 3 times >> Apr 19 23:21:02 tidbit kernel: code: -503 at fs/reiser4/seal.c:170 >> Apr 19 23:21:02 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 >> Apr 19 23:21:17 tidbit last message repeated 3 times >> Apr 19 23:21:22 tidbit kernel: code: -503 at fs/reiser4/seal.c:170 >> Apr 19 23:21:22 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 >> Apr 19 23:21:37 tidbit last message repeated 4 times >> Apr 19 23:21:42 tidbit kernel: code: -503 at fs/reiser4/seal.c:181 >> Apr 19 23:21:42 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 >> Apr 19 23:21:57 tidbit last message repeated 3 times >> Apr 19 23:22:02 tidbit kernel: code: -503 at >> fs/reiser4/plugin/file/file.c:791 >> Apr 19 23:22:02 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 >> Apr 19 23:22:32 tidbit last message repeated 7 times >> Apr 19 23:22:51 tidbit last message repeated 4 times >> Apr 19 23:22:56 tidbit kernel: code: -503 at >> fs/reiser4/plugin/file/file.c:791 >> Apr 19 23:22:56 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 >> Apr 19 23:23:12 tidbit last message repeated 4 times >> Apr 19 23:23:16 tidbit kernel: code: -503 at >> fs/reiser4/plugin/file/file.c:791 >> Apr 19 23:23:16 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 >> Apr 19 23:23:30 tidbit last message repeated 5 times >> Apr 19 23:23:30 tidbit kernel: code: -503 at >> fs/reiser4/plugin/file/file.c:791 >> Apr 19 23:23:30 tidbit kernel: code: -503 at >> fs/reiser4/plugin/file/file.c:791 >> Apr 19 23:23:30 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 >> Apr 19 23:23:32 tidbit kernel: code: -503 at >> fs/reiser4/plugin/file/file.c:791 >> Apr 19 23:23:32 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 >> Apr 19 23:23:34 tidbit kernel: code: -503 at fs/reiser4/seal.c:170 >> Apr 19 23:23:34 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 >> Apr 19 23:23:36 tidbit kernel: code: -503 at fs/reiser4/seal.c:170 >> Apr 19 23:23:36 tidbit kernel: code: -2 at fs/reiser4/search.c:1204 >> Apr 19 23:23:37 tidbit kernel: code: -28 at >> fs/reiser4/block_alloc.c:319 > > That looks ok, -28 == -ENOSPC. Sure, the error code is right. But it should not be saying that it has no disk space when it claims to df that there is still 3.5GB free! Cheers, Paul [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and bonnie problems 2004-04-20 8:54 ` Paul Wagland @ 2004-04-20 13:37 ` Nikita Danilov 2004-04-20 14:51 ` Alex Zarochentsev 0 siblings, 1 reply; 20+ messages in thread From: Nikita Danilov @ 2004-04-20 13:37 UTC (permalink / raw) To: Paul Wagland; +Cc: reiserfs-list, Markus TXrnqvist Paul Wagland writes: > > On Apr 20, 2004, at 9:51, Nikita Danilov wrote: > > > Paul Wagland writes: > >> Bonnie: drastic I/O error (write(2)): No space left on device > > > > Err.. this is completely different from what we had previously: "No > > space left on device" is understandable where not reasonable. What is > > completely mysterious is "No such file or directory" error that you > > reported before. > > Yes, last night I could not get that "No such file or directory" error > to appear. I will try to reproduce that particular error again later > tonight. However, this other problem happens at exactly the same time, > and is an error in the reiser4. > > Just to summarise everything that is written below, this is what bonnie > is doing: > 1. write 3.5 GB onto a 4GB partition. This works > 2. delete 3.5 GB from a 4GB partition. This works. Disk blocks freed during transaction are not actually freed until transaction commits. > 3. I check using df that the disk space is free. That works. But that "delayed freeing" confused users (they did cp, rm, but df has still showed that space is used), so that statfs(2) (system call used by df) was modified to take these delayed blocks into account and pretend that they are free. > 4. write 3.5GB onto a 4GB partition. This fails. Try to repeat this with sync before step 4. > > At the time of failure, according to df there is stiff 3.5 GB > free. So, I say that reiser4 has a problem. df lies. > [...] > > Cheers, > Paul Nikita. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and bonnie problems 2004-04-20 13:37 ` Nikita Danilov @ 2004-04-20 14:51 ` Alex Zarochentsev 2004-04-20 18:18 ` Paul Wagland 0 siblings, 1 reply; 20+ messages in thread From: Alex Zarochentsev @ 2004-04-20 14:51 UTC (permalink / raw) To: Nikita Danilov; +Cc: Paul Wagland, reiserfs-list On Tue, Apr 20, 2004 at 05:37:39PM +0400, Nikita Danilov wrote: > Paul Wagland writes: > > > > On Apr 20, 2004, at 9:51, Nikita Danilov wrote: > > > > > Paul Wagland writes: > > >> Bonnie: drastic I/O error (write(2)): No space left on device > > > > > > Err.. this is completely different from what we had previously: "No > > > space left on device" is understandable where not reasonable. What is > > > completely mysterious is "No such file or directory" error that you > > > reported before. > > > > Yes, last night I could not get that "No such file or directory" error > > to appear. I will try to reproduce that particular error again later > > tonight. However, this other problem happens at exactly the same time, > > and is an error in the reiser4. > > > > Just to summarise everything that is written below, this is what bonnie > > is doing: > > 1. write 3.5 GB onto a 4GB partition. This works > > 2. delete 3.5 GB from a 4GB partition. This works. > > Disk blocks freed during transaction are not actually freed until > transaction commits. > > > 3. I check using df that the disk space is free. That works. > > But that "delayed freeing" confused users (they did cp, rm, but df has > still showed that space is used), so that statfs(2) (system call used by > df) was modified to take these delayed blocks into account and pretend > that they are free. > > > 4. write 3.5GB onto a 4GB partition. This fails. > > Try to repeat this with sync before step 4. > > > > > At the time of failure, according to df there is stiff 3.5 GB > > free. So, I say that reiser4 has a problem. > > df lies. reiser4_statfs() was changed to report deleted blocks as free space immediately after rm(1). It was done because reiser4_write() should trigger fs commit and recover free space. If commit does not happen, it is a reiser4 bug. > > Paul > > Nikita. -- Alex. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and bonnie problems 2004-04-20 14:51 ` Alex Zarochentsev @ 2004-04-20 18:18 ` Paul Wagland 2004-04-21 7:03 ` Alex Zarochentsev 0 siblings, 1 reply; 20+ messages in thread From: Paul Wagland @ 2004-04-20 18:18 UTC (permalink / raw) To: Alex Zarochentsev; +Cc: reiserfs-list, Nikita Danilov [-- Attachment #1: Type: text/plain, Size: 4015 bytes --] On Apr 20, 2004, at 16:51, Alex Zarochentsev wrote: > On Tue, Apr 20, 2004 at 05:37:39PM +0400, Nikita Danilov wrote: >> Paul Wagland writes: >>> Just to summarise everything that is written below, this is what >>> bonnie >>> is doing: >>> 1. write 3.5 GB onto a 4GB partition. This works >>> 2. delete 3.5 GB from a 4GB partition. This works. >> >> Disk blocks freed during transaction are not actually freed until >> transaction commits. Under what conditions do transactions get committed? Alex mentioned below that every write is an implicit commit. Is that the only situation? Other than sync obviously :-) >>> 3. I check using df that the disk space is free. That works. >> >> But that "delayed freeing" confused users (they did cp, rm, but df has >> still showed that space is used), so that statfs(2) (system call used >> by >> df) was modified to take these delayed blocks into account and pretend >> that they are free. OK, that I can deal with, rm'ing a file should free the space ;-). However, if the transaction is not committed at this point, what happens if I lose power at this point? Is the filesystem rolled back to before the deletions? >>> 4. write 3.5GB onto a 4GB partition. This fails. >> >> Try to repeat this with sync before step 4. OK, here is the results of the test. I have decided to run it without bonnie, just to make sure that it was not the determining factor. ----------------- tidbit:~# mount | grep /mnt/sdr /dev/sdr1 on /mnt/sdr type reiser4 (rw) tidbit:~# df /mnt/sdr Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdr1 3984228 292 3983936 1% /mnt/sdr tidbit:~# dd if=/dev/zero of=/mnt/sdr/ddtest bs=512K count=7K ; rm /mnt/sdr/ddtest; df /mnt/sdr; dd if=/dev/zero of=/mnt/sdr/ddtest bs=512K count=7K ; rm /mnt/sdr/ddtest; df /mnt/sdr 7168+0 records in 7168+0 records out 3758096384 bytes transferred in 70.899981 seconds (53005605 bytes/sec) Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdr1 3984228 292 3983936 1% /mnt/sdr dd: writing `/mnt/sdr/ddtest': No space left on device 613+0 records in 612+0 records out 321384448 bytes transferred in 3.378787 seconds (95118291 bytes/sec) Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdr1 3984228 296 3983932 1% /mnt/sdr tidbit:~# dd if=/dev/zero of=/mnt/sdr/ddtest bs=512K count=7K ; rm /mnt/sdr/ddtest; df /mnt/sdr; sync; dd if=/dev/zero of=/mnt/sdr/ddtest bs=512K count=7K ; rm /mnt/sdr/ddtest; df /mnt/sdr 7168+0 records in 7168+0 records out 3758096384 bytes transferred in 73.244216 seconds (51309122 bytes/sec) Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdr1 3984228 296 3983932 1% /mnt/sdr 7168+0 records in 7168+0 records out 3758096384 bytes transferred in 70.666456 seconds (53180768 bytes/sec) Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdr1 3984228 292 3983936 1% /mnt/sdr ---------------- >>> At the time of failure, according to df there is stiff 3.5 GB >>> free. So, I say that reiser4 has a problem. >> >> df lies. No. df tells the user what reiser4 tells df. If df is lying it is because reiser4 has lied to it. If df tells me that there is 3.9GB available on the filesystem, then I expect that filesystem to allow me to write 3.9GB to it. > reiser4_statfs() was changed to report deleted blocks as free space > immediately > after rm(1). As mentioned above, this makes perfect sense, and leads to more 'intuitive' behaviour from the filesystem. I fully expect that the filesystem should change "established semantics", and in this sense the above change keeps these semantics, which is a good thing :-) > It was done because reiser4_write() should trigger fs commit and > recover free > space. If commit does not happen, it is a reiser4 bug. In that case I humbly submit that I have found a reiser4 bug :-) Cheers, Paul [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and bonnie problems 2004-04-20 18:18 ` Paul Wagland @ 2004-04-21 7:03 ` Alex Zarochentsev 0 siblings, 0 replies; 20+ messages in thread From: Alex Zarochentsev @ 2004-04-21 7:03 UTC (permalink / raw) To: Paul Wagland; +Cc: reiserfs-list On Tue, Apr 20, 2004 at 08:18:27PM +0200, Paul Wagland wrote: > > On Apr 20, 2004, at 16:51, Alex Zarochentsev wrote: > > >On Tue, Apr 20, 2004 at 05:37:39PM +0400, Nikita Danilov wrote: > >>Paul Wagland writes: > >>>Just to summarise everything that is written below, this is what > >>>bonnie > >>>is doing: > >>>1. write 3.5 GB onto a 4GB partition. This works > >>>2. delete 3.5 GB from a 4GB partition. This works. > >> > >>Disk blocks freed during transaction are not actually freed until > >>transaction commits. > > Under what conditions do transactions get committed? Alex mentioned > below that every write is an implicit commit. no. write should cause a commit _only_ if no free space. df (for reiser4) does not show correct free block counter value. It shows (1) amount of reiser4 free blocks plus (2) blocks which can be freed by atom commits. Those blocks are _potentially_ free. If blocks (1) are not enough, reiser4 has to commit atoms to free blocks (2). The explanation above is simplified a bit. Indeed, atom commit may free more blocks, some blocks are reserved for wandered log and they are freed after commit, some blocks can be freed by squalloc (node squeeze and allocate) operation which precedes atom commit. > Is that the only > situation? Other than sync obviously :-) 1. atom (transaction) is too old or too large. 2. VM asks for memory and reiser4 failed to free memory ways other than atom commit. 3. fsync. 4. reiser4 consideres the situation as close to OOM. reiser4_writepage() may force atoms to commit. > > >>>3. I check using df that the disk space is free. That works. > >> > >>But that "delayed freeing" confused users (they did cp, rm, but df has > >>still showed that space is used), so that statfs(2) (system call used > >>by > >>df) was modified to take these delayed blocks into account and pretend > >>that they are free. > > OK, that I can deal with, rm'ing a file should free the space ;-). > However, if the transaction is not committed at this point, what > happens if I lose power at this point? Is the filesystem rolled back to > before the deletions? > > >>>4. write 3.5GB onto a 4GB partition. This fails. > >> > >>Try to repeat this with sync before step 4. > > OK, here is the results of the test. I have decided to run it without > bonnie, just to make sure that it was not the determining factor. > > ----------------- > > tidbit:~# mount | grep /mnt/sdr > /dev/sdr1 on /mnt/sdr type reiser4 (rw) > tidbit:~# df /mnt/sdr > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/sdr1 3984228 292 3983936 1% /mnt/sdr > tidbit:~# dd if=/dev/zero of=/mnt/sdr/ddtest bs=512K count=7K ; rm > /mnt/sdr/ddtest; df /mnt/sdr; dd if=/dev/zero of=/mnt/sdr/ddtest > bs=512K count=7K ; rm /mnt/sdr/ddtest; df /mnt/sdr > 7168+0 records in > 7168+0 records out > 3758096384 bytes transferred in 70.899981 seconds (53005605 bytes/sec) > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/sdr1 3984228 292 3983936 1% /mnt/sdr > dd: writing `/mnt/sdr/ddtest': No space left on device > 613+0 records in > 612+0 records out > 321384448 bytes transferred in 3.378787 seconds (95118291 bytes/sec) > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/sdr1 3984228 296 3983932 1% /mnt/sdr > tidbit:~# dd if=/dev/zero of=/mnt/sdr/ddtest bs=512K count=7K ; rm > /mnt/sdr/ddtest; df /mnt/sdr; sync; dd if=/dev/zero of=/mnt/sdr/ddtest > bs=512K count=7K ; rm /mnt/sdr/ddtest; df /mnt/sdr > 7168+0 records in > 7168+0 records out > 3758096384 bytes transferred in 73.244216 seconds (51309122 bytes/sec) > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/sdr1 3984228 296 3983932 1% /mnt/sdr > 7168+0 records in > 7168+0 records out > 3758096384 bytes transferred in 70.666456 seconds (53180768 bytes/sec) > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/sdr1 3984228 292 3983936 1% /mnt/sdr > > ---------------- > > > >>>At the time of failure, according to df there is stiff 3.5 GB > >>>free. So, I say that reiser4 has a problem. > >> > >>df lies. > > No. df tells the user what reiser4 tells df. If df is lying it is > because reiser4 has lied to it. If df tells me that there is 3.9GB > available on the filesystem, then I expect that filesystem to allow me > to write 3.9GB to it. Yes. reiser4_stafs() was changed expecting that reiser4_filewrite(), for example, would free that space if necessary. I committed a fix for reiser4_write(). However it is not tested yet. > >reiser4_statfs() was changed to report deleted blocks as free space > >immediately > >after rm(1). > > As mentioned above, this makes perfect sense, and leads to more > 'intuitive' behaviour from the filesystem. I fully expect that the > filesystem should change "established semantics", and in this sense the > above change keeps these semantics, which is a good thing :-) > > >It was done because reiser4_write() should trigger fs commit and > >recover free > >space. If commit does not happen, it is a reiser4 bug. > > In that case I humbly submit that I have found a reiser4 bug :-) > > Cheers, > Paul Thanks. -- Alex. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and megaraid problems with debian 2.6.5 (*solved*) 2004-04-14 23:59 ` Paul Wagland 2004-04-16 20:39 ` mjt @ 2004-04-18 22:36 ` Paul Wagland 1 sibling, 0 replies; 20+ messages in thread From: Paul Wagland @ 2004-04-18 22:36 UTC (permalink / raw) To: Nikita Danilov Cc: reiserfs-list, Linux SCSI mailing list, Atul Mukker, Domenico Andreoli, Hans Reiser, Linux kernel mailing list [-- Attachment #1: Type: text/plain, Size: 892 bytes --] Hi all, well partly solved anyway... I am just posting this so that if anyone finds this thread later they can also find this conclusion... There is still more work to be done before this problem can be properly closed, but at least now I am certain that it has nothing to do with the hardware :-) It appears (my own unsupported theory) that the problem is that reiser4 is taking some time to free up the free blocks that are currently in use by the wandering log. Since I was running a test that causes a lot of wandering log to be created, and I was doing it on a filesystem with very little free space, then I was running into the problem. Rerunning the test with either a) more space, or b) a smaller data set solved the problem. On the reiserfs-list we are now trying to find out exactly why this is happening, and how to solve the problem properly. Cheers, Paul [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and megaraid problems with debian 2.6.5 2004-04-14 6:51 reiser4 and megaraid problems with debian 2.6.5 Paul Wagland 2004-04-14 9:05 ` Domenico Andreoli @ 2004-04-14 15:13 ` Hans Reiser 2004-04-14 15:37 ` Paul Wagland 1 sibling, 1 reply; 20+ messages in thread From: Hans Reiser @ 2004-04-14 15:13 UTC (permalink / raw) To: Paul Wagland Cc: Linux mailing list SCSI, Linux mailing list kernel, Atul Mukker Paul Wagland wrote: > Hi all, > > I would like to report on a problem that I am having. I am just > testing out the new megaraid unified driver, and have been doing some > baseline testing with bonnie++. > > My problem is that, although reiserfs, ext2, jfs and xfs all work, > reiser4 fails with the following error: > --- > Can't write block. > Bonnie: drastic I/O error (write(2)): No such file or directory > --- > > I am using the debian prepared kernel with the debian reiser4 patch. I > made a cursory examination of the patch, and it appears to correlate > fairly closely with the patch from the namesys site. In what way does it not correlate? > > Given that this works with reiserfs, ext2, jfs and xfs it would appear > to be a reiser4 problem, however ext3 also fails, though with a > different error, it claims that the disk is full, but it is trying to > write a 2 1GB files onto a 2.5GB filesystem, so it should have enough > room, and indeed it did even work two or three times out of about 10 > runs (lots of timing :-). This implies that it might be a megaraid > problem. As you can tell, I really have no idea ;-) > > I will try playing around tonight with an official kernel and the > official reiser4 patch to see if that makes any difference, but would > just like to raise this potential problem sooner rather than later. > > If I can help debug this situation (I am probably the only person > trying this combination :-) please let me know how I should go about it. > > Cheers, > Paul I don't have the hardware to test it, can you get the error without your hardware? -- Hans ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: reiser4 and megaraid problems with debian 2.6.5 2004-04-14 15:13 ` reiser4 and megaraid problems with debian 2.6.5 Hans Reiser @ 2004-04-14 15:37 ` Paul Wagland 0 siblings, 0 replies; 20+ messages in thread From: Paul Wagland @ 2004-04-14 15:37 UTC (permalink / raw) To: Hans Reiser; +Cc: Linux mailing list SCSI, Linux mailing list kernel [-- Attachment #1: Type: text/plain, Size: 1045 bytes --] Hi, On Apr 14, 2004, at 17:13, Hans Reiser wrote: > Paul Wagland wrote: > >> I am using the debian prepared kernel with the debian reiser4 patch. >> I made a cursory examination of the patch, and it appears to >> correlate fairly closely with the patch from the namesys site. > > In what way does it not correlate? As was mentioned by Domenico Andreoli the changes are just those required to get reiser4 to work under 2.6.5. Other differences are line offsets due to the fact that the debian kernel also has patches applied. >> If I can help debug this situation (I am probably the only person >> trying this combination :-) please let me know how I should go about >> it. > > I don't have the hardware to test it, can you get the error without > your hardware? Unfortunately, not easily, since this is the only box that I can currently test this out on. However, there a couple of tests that I can still perform (as mentioned elsewhere in this thread) and I will report back on the results of those later tonight. Cheers, Paul [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 186 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2004-04-21 7:03 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-04-14 6:51 reiser4 and megaraid problems with debian 2.6.5 Paul Wagland 2004-04-14 9:05 ` Domenico Andreoli 2004-04-14 12:36 ` Paul Wagland 2004-04-14 13:09 ` Nikita Danilov 2004-04-14 13:25 ` Paul Wagland 2004-04-14 13:45 ` Vladimir Saveliev 2004-04-14 14:03 ` Paul Wagland 2004-04-14 23:59 ` Paul Wagland 2004-04-16 20:39 ` mjt 2004-04-17 7:38 ` Paul Wagland 2004-04-19 21:40 ` reiser4 and bonnie problems Paul Wagland 2004-04-20 7:51 ` Nikita Danilov 2004-04-20 8:54 ` Paul Wagland 2004-04-20 13:37 ` Nikita Danilov 2004-04-20 14:51 ` Alex Zarochentsev 2004-04-20 18:18 ` Paul Wagland 2004-04-21 7:03 ` Alex Zarochentsev 2004-04-18 22:36 ` reiser4 and megaraid problems with debian 2.6.5 (*solved*) Paul Wagland 2004-04-14 15:13 ` reiser4 and megaraid problems with debian 2.6.5 Hans Reiser 2004-04-14 15:37 ` Paul Wagland
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.