* Re: link/unlink problem gone? [not found] ` <20030131105745.A7426@namesys.com> @ 2003-02-06 22:32 ` Zygo Blaxell 2003-02-07 6:19 ` Oleg Drokin 0 siblings, 1 reply; 3+ messages in thread From: Zygo Blaxell @ 2003-02-06 22:32 UTC (permalink / raw) To: reiserfs-list In article <20030131105745.A7426@namesys.com>, Oleg Drokin <green@namesys.com> wrote: >Hello! > > Sigh, these were false hopes indeed. > I can reproduce it with 2.4.21-pre4, only it is now harder for some reason. I've seen times-to-failure ranging from 20 minutes to 20+ hours (!). Interestingly enough, both extremes occurred back-to-back--I tried for 20 hours to reproduce the problem, failed, tried again with the same kernel setup, and 20 minutes later the machine was spewing out "Permission denied" too quickly to display. > Chris: My current idea is it happens during low memory conditions, so I am > actively running around prune_icache and id's dcache equivalent. Probably > you can easily reproduce that if you'd have no swap and not very much RAM. > > (Ok, I just checked, limited the RAM to 90M and turned off SWAP entirely. > and reproduced the problem fairly quickly) I have observed the problem on machines ranging in size from 96 to 512MB RAM. I haven't observed a correlation between swapping activity and failures but I haven't been looking for this either. The machines that have problems machines are swapping at some time or another (they have several hundred MB of swap used). -- Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org> GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: link/unlink problem gone? 2003-02-06 22:32 ` link/unlink problem gone? Zygo Blaxell @ 2003-02-07 6:19 ` Oleg Drokin 2003-02-09 5:16 ` Zygo Blaxell 0 siblings, 1 reply; 3+ messages in thread From: Oleg Drokin @ 2003-02-07 6:19 UTC (permalink / raw) To: Zygo Blaxell; +Cc: reiserfs-list Hello! On Thu, Feb 06, 2003 at 05:32:10PM -0500, Zygo Blaxell wrote: > > Sigh, these were false hopes indeed. > > I can reproduce it with 2.4.21-pre4, only it is now harder for some reason. > I've seen times-to-failure ranging from 20 minutes to 20+ hours (!). Same here. > > Chris: My current idea is it happens during low memory conditions, so I am > > actively running around prune_icache and id's dcache equivalent. Probably > > you can easily reproduce that if you'd have no swap and not very much RAM. > > > > (Ok, I just checked, limited the RAM to 90M and turned off SWAP entirely. > > and reproduced the problem fairly quickly) > I have observed the problem on machines ranging in size from 96 to > 512MB RAM. I haven't observed a correlation between swapping activity > and failures but I haven't been looking for this either. The machines I noticed that with newer 2.4.21-pre kernels first I see processes die because of OOM and only after that I see direntries pointing to nowhere. I reproduced this much more than once, so I believe there is some correlation between these. > that have problems machines are swapping at some time or another (they > have several hundred MB of swap used). And they are just swapping all the time, so it may take a while before useful code runs and problem happens, it seems. So far I decided that with SWAP turned off one can reproduce problem more easily that with SWAP on (especially if swap is large). Bye, Oleg ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: link/unlink problem gone? 2003-02-07 6:19 ` Oleg Drokin @ 2003-02-09 5:16 ` Zygo Blaxell 0 siblings, 0 replies; 3+ messages in thread From: Zygo Blaxell @ 2003-02-09 5:16 UTC (permalink / raw) To: reiserfs-list In article <20030207091933.A6256@namesys.com>, Oleg Drokin <green@namesys.com> wrote: >I noticed that with newer 2.4.21-pre kernels first I see processes die because >of OOM and only after that I see direntries pointing to nowhere. >I reproduced this much more than once, so I believe there is some correlation >between these. This may explain why the rsyncs seem to be necessary. When I was trying to build a bug demonstrator, I initially tried a bunch of processes doing link()/unlink() on a handful of files, and then tried the concurrent 'cp -al's and 'rm -rf's, but neither of those worked (I let it run all weekend). I then added the rsyncs just to add some churn to the filesystem and at that point the errors started appearing. I was so happy that I could reproduce the problem at all that the next thing I did was post to the list. ;-) So maybe rsync itself is irrelevant, except as a process that forces the system to swap. If you replace the rsync in the script with a process that eats memory but doesn't touch the filesystem (or alternatively, reduce the amount of RAM in the system so severely that cp will be swapping), does the script still demonstrate errors? Meanwhile, in the field I've rearranged various software so that it tries to avoid concurrently linking or unlinking the same files at the same time in order to work around this bug. I can't entirely prevent concurrent links or unlinks, but I'm pretty sure they're now happening much less often--tasks that were scheduled at the same time each day now occur 12 hours apart, jobs that worked on different parts of the directory concurrently now run sequentially, etc. Unfortunately, this bug's manifestations are still appearing at about the same rate they were before... :-P -- Zygo Blaxell (Laptop) <zblaxell@feedme.hungrycats.org> GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2003-02-09 5:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20030129164906.A8320@namesys.com>
[not found] ` <20030131105745.A7426@namesys.com>
2003-02-06 22:32 ` link/unlink problem gone? Zygo Blaxell
2003-02-07 6:19 ` Oleg Drokin
2003-02-09 5:16 ` Zygo Blaxell
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.