Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Karl Vogel <karl.vogel@seagha.com>, axboe@suse.de
Cc: linux-kernel@vger.kernel.org
Subject: Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
Date: Mon, 23 Aug 2004 11:12:06 -0300	[thread overview]
Message-ID: <20040823141206.GE2157@logos.cnet> (raw)
In-Reply-To: <m33c2f56ck.fsf_-_@seagha.com>

On Sun, Aug 22, 2004 at 09:18:51PM +0200, Karl Vogel wrote:
> When using elevator=as I'm unable to trigger the swap of death, so it seems
> that the CFQ scheduler is at blame here.
> 
> With AS scheduler, the system recovers in +-10 seconds, vmstat output during
> that time:
> 
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>  1  0      0 295632  40372  49400   87  278   324   303 1424   784  7  2 78 13
>  0  0      0 295632  40372  49400    0    0     0     0 1210   648  3  1 96  0
>  0  0      0 295632  40372  49400    0    0     0     0 1209   652  4  0 96  0
>  2  0      0 112784  40372  49400    0    0     0     0 1204   630 23 34 43  0
>  1  9 156236    788    264   8128   28 156220  3012 156228 3748  3655 11 31  0 59
>  0 15 176656   2196    280   8664    0 20420   556 20436 1108   374  2  5  0 93
>  0 17 205320    724    232   7960   28 28664   396 28664 1118   503  7 12  0 81
>  2 12 217892   1812    252   8556  248 12584   864 12584 1495   318  2  7  0 91
>  4 14 253268   2500    268   8728  188 35392   432 35392 1844   399  3  7  0 90
>  0 13 255692   1188    288   9152  960 2424  1408  2424 1173  2215 10  5  0 85
>  0  7 266140   2288    312   9276  604 10468   752 10468 1248   644  5  5  0 90
>  0  7 190516 340636    348   9860 1400    0  2016     0 1294   817  4  8  0 88
>  1  8 190516 339460    384  10844  552    0  1556     4 1241   642  3  1  0 96
>  1  3 190516 337084    404  11968 1432    0  2576     4 1292   788  3  1  0 96
>  0  6 190516 333892    420  13612 1844    0  3500     0 1343   850  5  2  0 93
>  0  1 190516 333700    424  13848  480    0   720     0 1250   654  3  2  0 95
>  0  1 190516 334468    424  13848  188    0   188     0 1224   589  3  2  0 95
> 
> With CFQ processes got stuck in 'D' and never left that state. See URL's in my
> initial post for diagnostics.

I can confirm this on a 512MB box with 512MB swap (2.6.8-rc4). Using CFQ the machine swaps out
400 megs, with AS it swaps out 30M.  

That leads to allocation failures/etc. 

CFQ allocates a huge number of bio/biovecs:

 cat /proc/slabinfo | grep bio
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata 52     52      0
biovec-64            265    265    768    5    1 : tunables   54   27    0 : slabdata 53     53      0
biovec-16            260    260    192   20    1 : tunables  120   60    0 : slabdata 13     13      0
biovec-4             272    305     64   61    1 : tunables  120   60    0 : slabdata  5      5      0
biovec-1          121088 122040     16  226    1 : tunables  120   60    0 : slabdata    540    540      0
bio               121131 121573     64   61    1 : tunables  120   60    0 : slabdata   1992   1993      0


biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata 128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata  52     52      0
biovec-64            265    265    768    5    1 : tunables   54   27    0 : slabdata  53     53      0
biovec-16            258    260    192   20    1 : tunables  120   60    0 : slabdata  13     13      0
biovec-4             257    305     64   61    1 : tunables  120   60    0 : slabdata   5      5      0
biovec-1           66390  68026     16  226    1 : tunables  120   60    0 : slabdata 301    301      0
bio                66389  67222     64   61    1 : tunables  120   60    0 : slabdata   1102   1102      0

(which are freed later on, but the cause for the trashing during the swap IO).

While AS does:

[marcelo@yage marcelo]$ cat /proc/slabinfo | grep bio
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata     52     52      0
biovec-64            260    260    768    5    1 : tunables   54   27    0 : slabdata     52     52      0
biovec-16            280    280    192   20    1 : tunables  120   60    0 : slabdata     14     14      0
biovec-4             264    305     64   61    1 : tunables  120   60    0 : slabdata      5      5      0
biovec-1            4478   5424     16  226    1 : tunables  120   60    0 : slabdata     24     24      0
bio                 4525   5002     64   61    1 : tunables  120   60    0 : slabdata     81     82      0


Odd thing is the 400M swapped out are not reclaimed after exp (the 512MB callocator) exits. With AS 
almost all swapped out memory is reclaimed on exit.

 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  0 492828  13308    320   3716    0    0     0     0 1002     5  0  0 100  0


Jens, is this huge amount of bio/biovec's allocations expected with CFQ? Its really really bad.

next prev parent reply	other threads:[~2004-08-23 15:38 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-22 13:27 Kernel 2.6.8.1: swap storm of death Karl Vogel
2004-08-22 13:33 ` Karl Vogel
2004-08-22 18:49 ` Kernel 2.6.8.1: swap storm of death - 2.6.8.1-mm4 also karl.vogel
2004-08-22 19:18   ` Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
2004-08-23 14:12     ` Marcelo Tosatti [this message]
2004-08-23 15:41       ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2004-08-23 16:10 Karl Vogel
2004-08-23 17:00 ` Jens Axboe
2004-08-24 10:03 ` Jens Axboe
2004-08-24  9:18   ` Marcelo Tosatti
2004-08-24 10:52     ` Jens Axboe
2004-08-24 10:13   ` Jens Axboe
2004-08-24 10:28 Karl Vogel
2004-08-24 10:29 ` Jens Axboe
2004-08-24 10:35 Karl Vogel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040823141206.GE2157@logos.cnet \
    --to=marcelo.tosatti@cyclades.com \
    --cc=axboe@suse.de \
    --cc=karl.vogel@seagha.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox