* Kernel 2.6.8.1: swap storm of death
@ 2004-08-22 13:27 Karl Vogel
2004-08-22 13:33 ` Karl Vogel
2004-08-22 18:49 ` Kernel 2.6.8.1: swap storm of death - 2.6.8.1-mm4 also karl.vogel
0 siblings, 2 replies; 6+ messages in thread
From: Karl Vogel @ 2004-08-22 13:27 UTC (permalink / raw)
To: linux-kernel
I can bring down my box by running a program that does a calloc() of 512Mb
(which is the size of my RAM). The box starts to heavily swap and never
recovers from it. The process that calloc's the memory gets OOM killed (which
is also strange as I have 1Gb free swap).
After the OOM kill, the shell where I started the calloc() program is alive
but very slow. The box continues to swap and the other processes remain dead.
To gather some more statistics, I did the following:
- start 'vmstat 1|tee vmstat.txt' in 1 VT session.
- run expunge (= program that does calloc(512Mb)) in another VT.
The box freezes for some time. After a while expunge is OOM killed, the vmstat
on the other VT remains dead. A ping over the network is still possible and I
can still start programs on the expunge VT, albeit it is slow as the disk is
still thrashing.
The diagnostics can be found here:
* Kernel .config
http://users.telenet.be/kvogel/config.txt
* expunge program
http://users.telenet.be/kvogel/expunge.c
* vmstat 1 output while executing expunge (this freezes)
http://users.telenet.be/kvogel/vmstat.txt
* vmstat in expunge VT after the OOM kill
http://users.telenet.be/kvogel/vmstat-after-kill.txt
* /proc/slabinfo after OOM kill
http://users.telenet.be/kvogel/slab.txt
* swapon -s
Filename Type Size Used
Priority
/dev/hda3 partition 1044216 0 -1
* Kernel boot line:
kernel /vmlinuz-2.6.8.1 ro root=/dev/compat/root elevator=cfq
voluntary-preempt=3 preempt=1
Kernel was patched with voluntary-preempt-2.6.8.1-P7
syslogd & klogd weren't running and 'dmesg -n 1' was done beforehand.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel 2.6.8.1: swap storm of death
2004-08-22 13:27 Kernel 2.6.8.1: swap storm of death Karl Vogel
@ 2004-08-22 13:33 ` Karl Vogel
2004-08-22 18:49 ` Kernel 2.6.8.1: swap storm of death - 2.6.8.1-mm4 also karl.vogel
1 sibling, 0 replies; 6+ messages in thread
From: Karl Vogel @ 2004-08-22 13:33 UTC (permalink / raw)
To: linux-kernel
On Sunday 22 August 2004 15:27, Karl Vogel wrote:
> The diagnostics can be found here:
Forgot one:
* ps ax - after OOM kill
http://users.telenet.be/kvogel/ps.txt
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel 2.6.8.1: swap storm of death - 2.6.8.1-mm4 also
2004-08-22 13:27 Kernel 2.6.8.1: swap storm of death Karl Vogel
2004-08-22 13:33 ` Karl Vogel
@ 2004-08-22 18:49 ` karl.vogel
2004-08-22 19:18 ` Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
1 sibling, 1 reply; 6+ messages in thread
From: karl.vogel @ 2004-08-22 18:49 UTC (permalink / raw)
To: linux-kernel
I just tried if I could trigger the same swap of death on 2.6.8.1-mm4.
It appears I could :(
I will have another go at it with elevator=as and see if that makes
a difference.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
2004-08-22 18:49 ` Kernel 2.6.8.1: swap storm of death - 2.6.8.1-mm4 also karl.vogel
@ 2004-08-22 19:18 ` Karl Vogel
2004-08-23 14:12 ` Marcelo Tosatti
0 siblings, 1 reply; 6+ messages in thread
From: Karl Vogel @ 2004-08-22 19:18 UTC (permalink / raw)
To: linux-kernel
When using elevator=as I'm unable to trigger the swap of death, so it seems
that the CFQ scheduler is at blame here.
With AS scheduler, the system recovers in +-10 seconds, vmstat output during
that time:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 295632 40372 49400 87 278 324 303 1424 784 7 2 78 13
0 0 0 295632 40372 49400 0 0 0 0 1210 648 3 1 96 0
0 0 0 295632 40372 49400 0 0 0 0 1209 652 4 0 96 0
2 0 0 112784 40372 49400 0 0 0 0 1204 630 23 34 43 0
1 9 156236 788 264 8128 28 156220 3012 156228 3748 3655 11 31 0 59
0 15 176656 2196 280 8664 0 20420 556 20436 1108 374 2 5 0 93
0 17 205320 724 232 7960 28 28664 396 28664 1118 503 7 12 0 81
2 12 217892 1812 252 8556 248 12584 864 12584 1495 318 2 7 0 91
4 14 253268 2500 268 8728 188 35392 432 35392 1844 399 3 7 0 90
0 13 255692 1188 288 9152 960 2424 1408 2424 1173 2215 10 5 0 85
0 7 266140 2288 312 9276 604 10468 752 10468 1248 644 5 5 0 90
0 7 190516 340636 348 9860 1400 0 2016 0 1294 817 4 8 0 88
1 8 190516 339460 384 10844 552 0 1556 4 1241 642 3 1 0 96
1 3 190516 337084 404 11968 1432 0 2576 4 1292 788 3 1 0 96
0 6 190516 333892 420 13612 1844 0 3500 0 1343 850 5 2 0 93
0 1 190516 333700 424 13848 480 0 720 0 1250 654 3 2 0 95
0 1 190516 334468 424 13848 188 0 188 0 1224 589 3 2 0 95
With CFQ processes got stuck in 'D' and never left that state. See URL's in my
initial post for diagnostics.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
2004-08-22 19:18 ` Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
@ 2004-08-23 14:12 ` Marcelo Tosatti
2004-08-23 15:41 ` Jens Axboe
0 siblings, 1 reply; 6+ messages in thread
From: Marcelo Tosatti @ 2004-08-23 14:12 UTC (permalink / raw)
To: Karl Vogel, axboe; +Cc: linux-kernel
On Sun, Aug 22, 2004 at 09:18:51PM +0200, Karl Vogel wrote:
> When using elevator=as I'm unable to trigger the swap of death, so it seems
> that the CFQ scheduler is at blame here.
>
> With AS scheduler, the system recovers in +-10 seconds, vmstat output during
> that time:
>
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 1 0 0 295632 40372 49400 87 278 324 303 1424 784 7 2 78 13
> 0 0 0 295632 40372 49400 0 0 0 0 1210 648 3 1 96 0
> 0 0 0 295632 40372 49400 0 0 0 0 1209 652 4 0 96 0
> 2 0 0 112784 40372 49400 0 0 0 0 1204 630 23 34 43 0
> 1 9 156236 788 264 8128 28 156220 3012 156228 3748 3655 11 31 0 59
> 0 15 176656 2196 280 8664 0 20420 556 20436 1108 374 2 5 0 93
> 0 17 205320 724 232 7960 28 28664 396 28664 1118 503 7 12 0 81
> 2 12 217892 1812 252 8556 248 12584 864 12584 1495 318 2 7 0 91
> 4 14 253268 2500 268 8728 188 35392 432 35392 1844 399 3 7 0 90
> 0 13 255692 1188 288 9152 960 2424 1408 2424 1173 2215 10 5 0 85
> 0 7 266140 2288 312 9276 604 10468 752 10468 1248 644 5 5 0 90
> 0 7 190516 340636 348 9860 1400 0 2016 0 1294 817 4 8 0 88
> 1 8 190516 339460 384 10844 552 0 1556 4 1241 642 3 1 0 96
> 1 3 190516 337084 404 11968 1432 0 2576 4 1292 788 3 1 0 96
> 0 6 190516 333892 420 13612 1844 0 3500 0 1343 850 5 2 0 93
> 0 1 190516 333700 424 13848 480 0 720 0 1250 654 3 2 0 95
> 0 1 190516 334468 424 13848 188 0 188 0 1224 589 3 2 0 95
>
> With CFQ processes got stuck in 'D' and never left that state. See URL's in my
> initial post for diagnostics.
I can confirm this on a 512MB box with 512MB swap (2.6.8-rc4). Using CFQ the machine swaps out
400 megs, with AS it swaps out 30M.
That leads to allocation failures/etc.
CFQ allocates a huge number of bio/biovecs:
cat /proc/slabinfo | grep bio
biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0
biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0
biovec-64 265 265 768 5 1 : tunables 54 27 0 : slabdata 53 53 0
biovec-16 260 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0
biovec-4 272 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0
biovec-1 121088 122040 16 226 1 : tunables 120 60 0 : slabdata 540 540 0
bio 121131 121573 64 61 1 : tunables 120 60 0 : slabdata 1992 1993 0
biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0
biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0
biovec-64 265 265 768 5 1 : tunables 54 27 0 : slabdata 53 53 0
biovec-16 258 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0
biovec-4 257 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0
biovec-1 66390 68026 16 226 1 : tunables 120 60 0 : slabdata 301 301 0
bio 66389 67222 64 61 1 : tunables 120 60 0 : slabdata 1102 1102 0
(which are freed later on, but the cause for the trashing during the swap IO).
While AS does:
[marcelo@yage marcelo]$ cat /proc/slabinfo | grep bio
biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0
biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0
biovec-64 260 260 768 5 1 : tunables 54 27 0 : slabdata 52 52 0
biovec-16 280 280 192 20 1 : tunables 120 60 0 : slabdata 14 14 0
biovec-4 264 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0
biovec-1 4478 5424 16 226 1 : tunables 120 60 0 : slabdata 24 24 0
bio 4525 5002 64 61 1 : tunables 120 60 0 : slabdata 81 82 0
Odd thing is the 400M swapped out are not reclaimed after exp (the 512MB callocator) exits. With AS
almost all swapped out memory is reclaimed on exit.
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 492828 13308 320 3716 0 0 0 0 1002 5 0 0 100 0
Jens, is this huge amount of bio/biovec's allocations expected with CFQ? Its really really bad.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
2004-08-23 14:12 ` Marcelo Tosatti
@ 2004-08-23 15:41 ` Jens Axboe
0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2004-08-23 15:41 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Karl Vogel, linux-kernel
On Mon, Aug 23 2004, Marcelo Tosatti wrote:
> On Sun, Aug 22, 2004 at 09:18:51PM +0200, Karl Vogel wrote:
> > When using elevator=as I'm unable to trigger the swap of death, so it seems
> > that the CFQ scheduler is at blame here.
> >
> > With AS scheduler, the system recovers in +-10 seconds, vmstat output during
> > that time:
> >
> > procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> > r b swpd free buff cache si so bi bo in cs us sy id wa
> > 1 0 0 295632 40372 49400 87 278 324 303 1424 784 7 2 78 13
> > 0 0 0 295632 40372 49400 0 0 0 0 1210 648 3 1 96 0
> > 0 0 0 295632 40372 49400 0 0 0 0 1209 652 4 0 96 0
> > 2 0 0 112784 40372 49400 0 0 0 0 1204 630 23 34 43 0
> > 1 9 156236 788 264 8128 28 156220 3012 156228 3748 3655 11 31 0 59
> > 0 15 176656 2196 280 8664 0 20420 556 20436 1108 374 2 5 0 93
> > 0 17 205320 724 232 7960 28 28664 396 28664 1118 503 7 12 0 81
> > 2 12 217892 1812 252 8556 248 12584 864 12584 1495 318 2 7 0 91
> > 4 14 253268 2500 268 8728 188 35392 432 35392 1844 399 3 7 0 90
> > 0 13 255692 1188 288 9152 960 2424 1408 2424 1173 2215 10 5 0 85
> > 0 7 266140 2288 312 9276 604 10468 752 10468 1248 644 5 5 0 90
> > 0 7 190516 340636 348 9860 1400 0 2016 0 1294 817 4 8 0 88
> > 1 8 190516 339460 384 10844 552 0 1556 4 1241 642 3 1 0 96
> > 1 3 190516 337084 404 11968 1432 0 2576 4 1292 788 3 1 0 96
> > 0 6 190516 333892 420 13612 1844 0 3500 0 1343 850 5 2 0 93
> > 0 1 190516 333700 424 13848 480 0 720 0 1250 654 3 2 0 95
> > 0 1 190516 334468 424 13848 188 0 188 0 1224 589 3 2 0 95
> >
> > With CFQ processes got stuck in 'D' and never left that state. See URL's in my
> > initial post for diagnostics.
>
> I can confirm this on a 512MB box with 512MB swap (2.6.8-rc4). Using CFQ the machine swaps out
> 400 megs, with AS it swaps out 30M.
>
> That leads to allocation failures/etc.
>
> CFQ allocates a huge number of bio/biovecs:
>
> cat /proc/slabinfo | grep bio
> biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0
> biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0
> biovec-64 265 265 768 5 1 : tunables 54 27 0 : slabdata 53 53 0
> biovec-16 260 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0
> biovec-4 272 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0
> biovec-1 121088 122040 16 226 1 : tunables 120 60 0 : slabdata 540 540 0
> bio 121131 121573 64 61 1 : tunables 120 60 0 : slabdata 1992 1993 0
>
>
> biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0
> biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0
> biovec-64 265 265 768 5 1 : tunables 54 27 0 : slabdata 53 53 0
> biovec-16 258 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0
> biovec-4 257 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0
> biovec-1 66390 68026 16 226 1 : tunables 120 60 0 : slabdata 301 301 0
> bio 66389 67222 64 61 1 : tunables 120 60 0 : slabdata 1102 1102 0
>
> (which are freed later on, but the cause for the trashing during the swap IO).
>
> While AS does:
>
> [marcelo@yage marcelo]$ cat /proc/slabinfo | grep bio
> biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0
> biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0
> biovec-64 260 260 768 5 1 : tunables 54 27 0 : slabdata 52 52 0
> biovec-16 280 280 192 20 1 : tunables 120 60 0 : slabdata 14 14 0
> biovec-4 264 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0
> biovec-1 4478 5424 16 226 1 : tunables 120 60 0 : slabdata 24 24 0
> bio 4525 5002 64 61 1 : tunables 120 60 0 : slabdata 81 82 0
>
>
> Odd thing is the 400M swapped out are not reclaimed after exp (the 512MB callocator) exits. With AS
> almost all swapped out memory is reclaimed on exit.
>
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 0 0 492828 13308 320 3716 0 0 0 0 1002 5 0 0 100 0
>
>
> Jens, is this huge amount of bio/biovec's allocations expected with CFQ? Its really really bad.
Nope, it's not by design :-)
A test case would be nice, then I'll fix it as soon as possible. But
please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
fix to ll_rw_blk that can easily cause this. The first report is for
2.6.8.1, so I'm more puzzled on that.
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-08-23 15:49 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-22 13:27 Kernel 2.6.8.1: swap storm of death Karl Vogel
2004-08-22 13:33 ` Karl Vogel
2004-08-22 18:49 ` Kernel 2.6.8.1: swap storm of death - 2.6.8.1-mm4 also karl.vogel
2004-08-22 19:18 ` Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
2004-08-23 14:12 ` Marcelo Tosatti
2004-08-23 15:41 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox