public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* RE: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
@ 2004-08-24 10:28 Karl Vogel
  2004-08-24 10:29 ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Karl Vogel @ 2004-08-24 10:28 UTC (permalink / raw)
  To: 'Jens Axboe'; +Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar

> > > Original post with testcase + stats:
> > >   http://article.gmane.org/gmane.linux.kernel/228156
> > 
> > 2.6.8.1-mm4 clean does not reproduce the problem. Marcelo, your
> > 2.6.8-rc4 report is not valid due to the fixed problem 
> related to that
> > in CFQ already. I'd still like for you to retest with 2.6.8.1.
> > 

Did some extra testing yesterday. When not running X or anything
substantial, I'm able to trigger it after running the expunge 2 or
3 times in a row. 
If I increase the calloc size, it triggers faster (tried with 1Gb
calloc on a 512Mb box with 1Gb swap partition). 

The first expunge run, completes fine. The ones after that, get 
OOM killed and I get a printk about page allocation order 0 failure.

The 2.6.8.1-mm4 was a clean version, but I will double check this,
this evening.

I also tried with deadline, but was unable to trigger it.

> Oh, and please do also do a sysrq-t from a hung box and save 
> the output.

Note: the box doesn't hang completely. Just some processes get stuck
in 'D' and the machine swaps heavily.

The tests of yesterday evening, did recover. So I'm guessing if I had
waited long enough the box would have recovered on the previous
tests. Looking at the vmstat from my previous tests, shows that the
box was low on memory (free/buff/cache are all very low):

  http://users.telenet.be/kvogel/vmstat-after-kill.txt

That was probably why it was swapping like mad. 


Will provide you with that sysrq-t this evening.

Karl.

^ permalink raw reply	[flat|nested] 12+ messages in thread
* RE: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
@ 2004-08-24 10:35 Karl Vogel
  0 siblings, 0 replies; 12+ messages in thread
From: Karl Vogel @ 2004-08-24 10:35 UTC (permalink / raw)
  To: 'Jens Axboe'; +Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar

> > The tests of yesterday evening, did recover. So I'm 
> guessing if I had
> > waited long enough the box would have recovered on the previous
> > tests. Looking at the vmstat from my previous tests, shows that the
> > box was low on memory (free/buff/cache are all very low):
> > 
> >   http://users.telenet.be/kvogel/vmstat-after-kill.txt
> > 
> > That was probably why it was swapping like mad. 
> 
> Ok, so now I'm confused - tests on what kernel recovered?

2.6.8.1 with voluntary-preempt-P7

The same kernel as the one that didn't recover (waited 10 minutes,
after which it was still swapping like mad).

Ofcourse the test where it recovered was when nothing else was
running on the box (no X session, no KDE, just plain 'init 3').

Karl.

^ permalink raw reply	[flat|nested] 12+ messages in thread
* RE: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
@ 2004-08-23 16:10 Karl Vogel
  2004-08-23 17:00 ` Jens Axboe
  2004-08-24 10:03 ` Jens Axboe
  0 siblings, 2 replies; 12+ messages in thread
From: Karl Vogel @ 2004-08-23 16:10 UTC (permalink / raw)
  To: 'Jens Axboe', Marcelo Tosatti; +Cc: linux-kernel

> > Jens, is this huge amount of bio/biovec's allocations 
> expected with CFQ? Its really really bad.
> 
> Nope, it's not by design :-)
> 
> A test case would be nice, then I'll fix it as soon as possible. But
> please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
> fix to ll_rw_blk that can easily cause this. The first report is for
> 2.6.8.1, so I'm more puzzled on that.

I tried with 2.6.8.1 and 2.6.8.1-mm4, both had the problem. If there 
is anything extra I need to try/record, just shoot!

Original post with testcase + stats:
  http://article.gmane.org/gmane.linux.kernel/228156


^ permalink raw reply	[flat|nested] 12+ messages in thread
* Kernel 2.6.8.1: swap storm of death
@ 2004-08-22 13:27 Karl Vogel
  2004-08-22 18:49 ` Kernel 2.6.8.1: swap storm of death - 2.6.8.1-mm4 also karl.vogel
  0 siblings, 1 reply; 12+ messages in thread
From: Karl Vogel @ 2004-08-22 13:27 UTC (permalink / raw)
  To: linux-kernel

I can bring down my box by running a program that does a calloc() of 512Mb 
(which is the size of my RAM). The box starts to heavily swap and never 
recovers from it. The process that calloc's the memory gets OOM killed (which 
is also strange as I have 1Gb free swap).

After the OOM kill, the shell where I started the calloc() program is alive 
but very slow. The box continues to swap and the other processes remain dead.

To gather some more statistics, I did the following:

- start 'vmstat 1|tee vmstat.txt' in 1 VT session.
- run expunge (= program that does calloc(512Mb)) in another VT.

The box freezes for some time. After a while expunge is OOM killed, the vmstat 
on the other VT remains dead. A ping over the network is still possible and I 
can still start programs on the expunge VT, albeit it is slow as the disk is 
still thrashing.



The diagnostics can be found here:

* Kernel .config
  http://users.telenet.be/kvogel/config.txt

* expunge program
  http://users.telenet.be/kvogel/expunge.c

* vmstat 1  output while executing expunge (this freezes)
  http://users.telenet.be/kvogel/vmstat.txt

* vmstat in expunge VT after the OOM kill
  http://users.telenet.be/kvogel/vmstat-after-kill.txt

* /proc/slabinfo after OOM kill
  http://users.telenet.be/kvogel/slab.txt

* swapon -s
Filename                                Type            Size    Used    
Priority
/dev/hda3                               partition       1044216 0       -1

* Kernel boot line:
       kernel /vmlinuz-2.6.8.1 ro root=/dev/compat/root elevator=cfq 
voluntary-preempt=3 preempt=1

Kernel was patched with voluntary-preempt-2.6.8.1-P7
syslogd & klogd weren't running and 'dmesg -n 1' was done beforehand.




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-08-24 10:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-24 10:28 Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
2004-08-24 10:29 ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2004-08-24 10:35 Karl Vogel
2004-08-23 16:10 Karl Vogel
2004-08-23 17:00 ` Jens Axboe
2004-08-24 10:03 ` Jens Axboe
2004-08-24  9:18   ` Marcelo Tosatti
2004-08-24 10:52     ` Jens Axboe
2004-08-24 10:13   ` Jens Axboe
2004-08-22 13:27 Kernel 2.6.8.1: swap storm of death Karl Vogel
2004-08-22 18:49 ` Kernel 2.6.8.1: swap storm of death - 2.6.8.1-mm4 also karl.vogel
2004-08-22 19:18   ` Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
2004-08-23 14:12     ` Marcelo Tosatti
2004-08-23 15:41       ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox