* Possible freezing bug located after ac13
@ 2001-06-24 2:29 tcm
2001-06-24 2:54 ` Rik van Riel
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: tcm @ 2001-06-24 2:29 UTC (permalink / raw)
To: linux-kernel
I've recently been going slightly nuts with the fact ac15, 16, and 17
all like deadlocking/slowing to a crawl for seconds/minutes on my K6-III
with 64MB of ram and a swap space of 128MB...
Recently I noticed something VERY odd, I'd been keeping an eye on
gkrellm while I was doing stupid things to produce the problem (a du
as root in X of / generally would always make it pop up) ... And swap
was doing I/O at the time *JUST* before when I'd either deadlock or slow
down to a crawl, and if it recovered, swap would do more I/O...
So. I tried unmounting all swap, and suddenly everything worked fine,
although I couldn't exactly do everythign I wanted of course.
I regression tested this, ac 16,15 and even 14 do this. ac 13 does *not*
- IMHO I think the dead swap patches introduced into 14 may be related
to the problem.
Just my two cents.
Tim
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Possible freezing bug located after ac13 2001-06-24 2:29 Possible freezing bug located after ac13 tcm @ 2001-06-24 2:54 ` Rik van Riel 2001-06-26 22:38 ` Swap error message I've seen in 2.4.5-ac17 tcm 2001-06-28 0:33 ` Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2 tcm 2 siblings, 0 replies; 7+ messages in thread From: Rik van Riel @ 2001-06-24 2:54 UTC (permalink / raw) To: tcm; +Cc: linux-kernel On Sat, 23 Jun 2001 tcm@nac.net wrote: > I've recently been going slightly nuts with the fact ac15, 16, and 17 > all like deadlocking/slowing to a crawl for seconds/minutes on my K6-III > with 64MB of ram and a swap space of 128MB... > > Recently I noticed something VERY odd, I'd been keeping an eye on > gkrellm while I was doing stupid things to produce the problem (a du > as root in X of / generally would always make it pop up) ... And swap > was doing I/O at the time *JUST* before when I'd either deadlock or slow > down to a crawl, and if it recovered, swap would do more I/O... > > So. I tried unmounting all swap, and suddenly everything worked fine, > although I couldn't exactly do everythign I wanted of course. > > I regression tested this, ac 16,15 and even 14 do this. ac 13 does *not* > - IMHO I think the dead swap patches introduced into 14 may be related > to the problem. 1) the dead swap cache patch should alleviate the problem, if anything 2) does this happen with 2.4.6-pre5 too ? regards, Rik -- Executive summary of a recent Microsoft press release: "we are concerned about the GNU General Public License (GPL)" http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Swap error message I've seen in 2.4.5-ac17 2001-06-24 2:29 Possible freezing bug located after ac13 tcm 2001-06-24 2:54 ` Rik van Riel @ 2001-06-26 22:38 ` tcm 2001-06-28 0:33 ` Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2 tcm 2 siblings, 0 replies; 7+ messages in thread From: tcm @ 2001-06-26 22:38 UTC (permalink / raw) To: linux-kernel Yep, me again. I've been playing around with ac17 on my old 486 machine for a few days (it seems strange that the 486 works fine while the K6 doesn't, but I digress) and I noticed today something that made my hair stand on end: Jun 26 16:17:27 debian kernel: VM: Bad swap entry 0033da00 Jun 26 16:17:27 debian kernel: Unused swap offset entry in swap_count 0033da00 Jun 26 16:17:27 debian kernel: Unused swap offset entry in swap_count 0033da00 Jun 26 16:38:16 debian -- MARK -- Jun 26 16:53:13 debian kernel: PPP BSD Compression module registered Jun 26 16:53:14 debian kernel: PPP Deflate Compression module registered Jun 26 16:53:24 debian kernel: VM: Bad swap entry 0033da00 Now I have been told by Rik Van Riel that this is a kernel bug - I initially figured it was a bad disk, thanks to him I can breathe now... Anyway, at the time the kernel did these messages I was just stopping playing quake on my K6-III (486 handles packets to/from the modem) and was reloading the compression modules, changing the mtu of my modem's interface to 1500 from 576, and starting fetchmail. And about one minute later I decided to simply disconnect. I can't seem to find a way to reproduce this problem all the time like I can with the freezing bug, but I will reply to this thread if I see it again and/or can repeatedly reproduce it. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2 2001-06-24 2:29 Possible freezing bug located after ac13 tcm 2001-06-24 2:54 ` Rik van Riel 2001-06-26 22:38 ` Swap error message I've seen in 2.4.5-ac17 tcm @ 2001-06-28 0:33 ` tcm 2001-06-27 23:11 ` Marcelo Tosatti 2001-07-01 5:08 ` tcm 2 siblings, 2 replies; 7+ messages in thread From: tcm @ 2001-06-28 0:33 UTC (permalink / raw) To: linux-kernel I decided, for the hell of it, to test the pre series as I've been nudged by many people to try it in favor of the ac kernel series that I've been having problems with. Well, it turns out I have ran into exactly the same problem I had with the ac kernel series, which quite frankly is surprising the hell out of me. To make the kernel freeze/slow down to a crawl with affected kernels on my machine I do this test: Load X (This fills up my ram and causes me to swap a bit) run a rxvt and su to root (proboably unnecessary) du / Now, somewhere in this test I start swapping a little bit, nothing big... then BAM. hard disk, mouse, keyboard, all completely and utterly stop. Video continues to work, but my cpu's load goes absolutely INSANE. (If it recovers, gkrellm generally says I've gotten a loadavg somewhere between 3-20, depending on how long it was stuck) This can last for seconds (usually) minutes (once) or it can simply get worse and hang the machine (many, many many times) When it recovers from this, I generally see a MASSIVE write to swap, (I'm using gkrellm to monitor it) and the system continues on as if nothing happened - until, of course, this happens again. A kernel compile can cause it. a rm -R of a large directory can cause it. Loading a large application can cause it. On some kernels this is more noticable than others - ac15 does it the worst, although pre3 rivals it, and the symptoms are different on ac17/18 - it'll simply freeze randomly and with no recovery instead of sometimes freezing or sometimes slowing down to a crawl and recovering or freezing. (Which is worse? You decide.) Now, as before, I tested this with swap and without swap. With swap, I get the hangs/freezes in all the affected kernels. Without swap, I don't. Nada. Now, the big question of the day folks: What changed between 2.4.6-pre2 and 2.4.6-pre3 that ALSO changed between 2.4.5-ac13 and 2.4.5-ac14 - and now, what part of those patches were the VM? Anyone? I don't see in 2.4.6-pre3 what changed that was part of the VM... So I am trying to narrow it down a bit :) This bug is driving me slightly nuts, so I want it dead. Anyone got a exterminator handy? =) Refer to my previous post with this subject for my original description of this problem. It's still there in ac18, though I've not tested 19 (Some have said it's not likely to have been fixed, and I've been regress testing 2.4.6pre's today.) Subject: Possible freezing bug located after ac13 Let me know if I can provide any additional information that will help nail this bug to the wall. (I want to torture it. =) Tim ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2 2001-06-28 0:33 ` Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2 tcm @ 2001-06-27 23:11 ` Marcelo Tosatti 2001-06-27 23:25 ` Marcelo Tosatti 2001-07-01 5:08 ` tcm 1 sibling, 1 reply; 7+ messages in thread From: Marcelo Tosatti @ 2001-06-27 23:11 UTC (permalink / raw) To: tcm; +Cc: linux-kernel On Wed, 27 Jun 2001 tcm@nac.net wrote: > I decided, for the hell of it, to test the pre series as I've been > nudged by many people to try it in favor of the ac kernel series that > I've been having problems with. Well, it turns out I have ran into > exactly the same problem I had with the ac kernel series, which quite > frankly is surprising the hell out of me. > > To make the kernel freeze/slow down to a crawl with affected kernels on > my machine I do this test: > > Load X (This fills up my ram and causes me to swap a bit) > run a rxvt and su to root (proboably unnecessary) > du / > > Now, somewhere in this test I start swapping a little bit, nothing > big... then BAM. hard disk, mouse, keyboard, all completely and utterly > stop. Video continues to work, but my cpu's load goes absolutely INSANE. > (If it recovers, gkrellm generally says I've gotten a loadavg somewhere > between 3-20, depending on how long it was stuck) This can last for > seconds (usually) minutes (once) or it can simply get worse and hang the > machine (many, many many times) > > When it recovers from this, I generally see a MASSIVE write to swap, > (I'm using gkrellm to monitor it) and the system continues on as if > nothing happened - until, of course, this happens again. A kernel > compile can cause it. a rm -R of a large directory can cause it. Loading > a large application can cause it. > > On some kernels this is more noticable than others - ac15 does it the > worst, although pre3 rivals it, and the symptoms are different on > ac17/18 - it'll simply freeze randomly and with no recovery instead of > sometimes freezing or sometimes slowing down to a crawl and recovering > or freezing. (Which is worse? You decide.) > > Now, as before, I tested this with swap and without swap. With swap, I > get the hangs/freezes in all the affected kernels. Without swap, I > don't. Nada. > > Now, the big question of the day folks: What changed between 2.4.6-pre2 > and 2.4.6-pre3 that ALSO changed between 2.4.5-ac13 and 2.4.5-ac14 - and > now, what part of those patches were the VM? Anyone? I don't see in > 2.4.6-pre3 what changed that was part of the VM... So I am trying to > narrow it down a bit :) > > This bug is driving me slightly nuts, so I want it dead. Anyone got a > exterminator handy? =) Rik's page_launder() changes. > > Refer to my previous post with this subject for my original description > of this problem. It's still there in ac18, though I've not tested 19 > (Some have said it's not likely to have been fixed, and I've been > regress testing 2.4.6pre's today.) > > Subject: Possible freezing bug located after ac13 > > Let me know if I can provide any additional information that will help > nail this bug to the wall. (I want to torture it. =) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2 2001-06-27 23:11 ` Marcelo Tosatti @ 2001-06-27 23:25 ` Marcelo Tosatti 0 siblings, 0 replies; 7+ messages in thread From: Marcelo Tosatti @ 2001-06-27 23:25 UTC (permalink / raw) To: tcm; +Cc: linux-kernel On Wed, 27 Jun 2001, Marcelo Tosatti wrote: > > > On Wed, 27 Jun 2001 tcm@nac.net wrote: > > > I decided, for the hell of it, to test the pre series as I've been > > nudged by many people to try it in favor of the ac kernel series that > > I've been having problems with. Well, it turns out I have ran into > > exactly the same problem I had with the ac kernel series, which quite > > frankly is surprising the hell out of me. > > > > To make the kernel freeze/slow down to a crawl with affected kernels on > > my machine I do this test: > > > > Load X (This fills up my ram and causes me to swap a bit) > > run a rxvt and su to root (proboably unnecessary) > > du / > > > > Now, somewhere in this test I start swapping a little bit, nothing > > big... then BAM. hard disk, mouse, keyboard, all completely and utterly > > stop. Video continues to work, but my cpu's load goes absolutely INSANE. > > (If it recovers, gkrellm generally says I've gotten a loadavg somewhere > > between 3-20, depending on how long it was stuck) This can last for > > seconds (usually) minutes (once) or it can simply get worse and hang the > > machine (many, many many times) > > > > When it recovers from this, I generally see a MASSIVE write to swap, > > (I'm using gkrellm to monitor it) and the system continues on as if > > nothing happened - until, of course, this happens again. A kernel > > compile can cause it. a rm -R of a large directory can cause it. Loading > > a large application can cause it. > > > > On some kernels this is more noticable than others - ac15 does it the > > worst, although pre3 rivals it, and the symptoms are different on > > ac17/18 - it'll simply freeze randomly and with no recovery instead of > > sometimes freezing or sometimes slowing down to a crawl and recovering > > or freezing. (Which is worse? You decide.) > > > > Now, as before, I tested this with swap and without swap. With swap, I > > get the hangs/freezes in all the affected kernels. Without swap, I > > don't. Nada. > > > > Now, the big question of the day folks: What changed between 2.4.6-pre2 > > and 2.4.6-pre3 that ALSO changed between 2.4.5-ac13 and 2.4.5-ac14 - and > > now, what part of those patches were the VM? Anyone? I don't see in > > 2.4.6-pre3 what changed that was part of the VM... So I am trying to > > narrow it down a bit :) > > > > This bug is driving me slightly nuts, so I want it dead. Anyone got a > > exterminator handy? =) > > Rik's page_launder() changes. Eek. I mean Rik's page_launder() changes are _causing_ the problem. (its the only VM change between 2.4.6-pre2->pre3/2.4.5-ac13->ac14) Question: Whats the size of the inactive dirty and clean lists when you're about to crash. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2 2001-06-28 0:33 ` Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2 tcm 2001-06-27 23:11 ` Marcelo Tosatti @ 2001-07-01 5:08 ` tcm 1 sibling, 0 replies; 7+ messages in thread From: tcm @ 2001-07-01 5:08 UTC (permalink / raw) To: linux-kernel; +Cc: Rik van Riel, Marcelo Tosatti, Linus Torvalds I'm currently running 2.4.6-pre8 and happy as a clam, the problem has been found and reverted, looks from my discussions with Linus like the page_launder change introduced into pre3 and also included in ac14 was causing the hangs/near freezes. I'm not really much of a coder, so I can't say what was wrong with it, only what the symptoms were and how to get it to screw up whenever I wanted to test for it. (See previous messages for how to do this) If Rik van Riel/Marcelo Tosatti/anyone wants to have me gather information on what is going on just before/after the kernel dies I'll do it - just tell me how to, and I'll push it along :) Thanks a bunch Linus, Tim ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2001-07-01 5:09 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-06-24 2:29 Possible freezing bug located after ac13 tcm 2001-06-24 2:54 ` Rik van Riel 2001-06-26 22:38 ` Swap error message I've seen in 2.4.5-ac17 tcm 2001-06-28 0:33 ` Freezing bug in all kernels greater than 2.4.5-ac13 *AND* 2.4.6-pre2 tcm 2001-06-27 23:11 ` Marcelo Tosatti 2001-06-27 23:25 ` Marcelo Tosatti 2001-07-01 5:08 ` tcm
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox