Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@suse.de>
To: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Cc: Karl Vogel <karl.vogel@seagha.com>, linux-kernel@vger.kernel.org
Subject: Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
Date: Mon, 23 Aug 2004 17:41:14 +0200	[thread overview]
Message-ID: <20040823154113.GZ2301@suse.de> (raw)
In-Reply-To: <20040823141206.GE2157@logos.cnet>

On Mon, Aug 23 2004, Marcelo Tosatti wrote:
> On Sun, Aug 22, 2004 at 09:18:51PM +0200, Karl Vogel wrote:
> > When using elevator=as I'm unable to trigger the swap of death, so it seems
> > that the CFQ scheduler is at blame here.
> > 
> > With AS scheduler, the system recovers in +-10 seconds, vmstat output during
> > that time:
> > 
> > procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
> >  1  0      0 295632  40372  49400   87  278   324   303 1424   784  7  2 78 13
> >  0  0      0 295632  40372  49400    0    0     0     0 1210   648  3  1 96  0
> >  0  0      0 295632  40372  49400    0    0     0     0 1209   652  4  0 96  0
> >  2  0      0 112784  40372  49400    0    0     0     0 1204   630 23 34 43  0
> >  1  9 156236    788    264   8128   28 156220  3012 156228 3748  3655 11 31  0 59
> >  0 15 176656   2196    280   8664    0 20420   556 20436 1108   374  2  5  0 93
> >  0 17 205320    724    232   7960   28 28664   396 28664 1118   503  7 12  0 81
> >  2 12 217892   1812    252   8556  248 12584   864 12584 1495   318  2  7  0 91
> >  4 14 253268   2500    268   8728  188 35392   432 35392 1844   399  3  7  0 90
> >  0 13 255692   1188    288   9152  960 2424  1408  2424 1173  2215 10  5  0 85
> >  0  7 266140   2288    312   9276  604 10468   752 10468 1248   644  5  5  0 90
> >  0  7 190516 340636    348   9860 1400    0  2016     0 1294   817  4  8  0 88
> >  1  8 190516 339460    384  10844  552    0  1556     4 1241   642  3  1  0 96
> >  1  3 190516 337084    404  11968 1432    0  2576     4 1292   788  3  1  0 96
> >  0  6 190516 333892    420  13612 1844    0  3500     0 1343   850  5  2  0 93
> >  0  1 190516 333700    424  13848  480    0   720     0 1250   654  3  2  0 95
> >  0  1 190516 334468    424  13848  188    0   188     0 1224   589  3  2  0 95
> > 
> > With CFQ processes got stuck in 'D' and never left that state. See URL's in my
> > initial post for diagnostics.
> 
> I can confirm this on a 512MB box with 512MB swap (2.6.8-rc4). Using CFQ the machine swaps out
> 400 megs, with AS it swaps out 30M.  
> 
> That leads to allocation failures/etc. 
> 
> CFQ allocates a huge number of bio/biovecs:
> 
>  cat /proc/slabinfo | grep bio
> biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
> biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata 52     52      0
> biovec-64            265    265    768    5    1 : tunables   54   27    0 : slabdata 53     53      0
> biovec-16            260    260    192   20    1 : tunables  120   60    0 : slabdata 13     13      0
> biovec-4             272    305     64   61    1 : tunables  120   60    0 : slabdata  5      5      0
> biovec-1          121088 122040     16  226    1 : tunables  120   60    0 : slabdata    540    540      0
> bio               121131 121573     64   61    1 : tunables  120   60    0 : slabdata   1992   1993      0
> 
> 
> biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata 128    128      0
> biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata  52     52      0
> biovec-64            265    265    768    5    1 : tunables   54   27    0 : slabdata  53     53      0
> biovec-16            258    260    192   20    1 : tunables  120   60    0 : slabdata  13     13      0
> biovec-4             257    305     64   61    1 : tunables  120   60    0 : slabdata   5      5      0
> biovec-1           66390  68026     16  226    1 : tunables  120   60    0 : slabdata 301    301      0
> bio                66389  67222     64   61    1 : tunables  120   60    0 : slabdata   1102   1102      0
> 
> (which are freed later on, but the cause for the trashing during the swap IO).
> 
> While AS does:
> 
> [marcelo@yage marcelo]$ cat /proc/slabinfo | grep bio
> biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
> biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata     52     52      0
> biovec-64            260    260    768    5    1 : tunables   54   27    0 : slabdata     52     52      0
> biovec-16            280    280    192   20    1 : tunables  120   60    0 : slabdata     14     14      0
> biovec-4             264    305     64   61    1 : tunables  120   60    0 : slabdata      5      5      0
> biovec-1            4478   5424     16  226    1 : tunables  120   60    0 : slabdata     24     24      0
> bio                 4525   5002     64   61    1 : tunables  120   60    0 : slabdata     81     82      0
> 
> 
> Odd thing is the 400M swapped out are not reclaimed after exp (the 512MB callocator) exits. With AS 
> almost all swapped out memory is reclaimed on exit.
> 
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>  0  0 492828  13308    320   3716    0    0     0     0 1002     5  0  0 100  0
> 
> 
> Jens, is this huge amount of bio/biovec's allocations expected with CFQ? Its really really bad.

Nope, it's not by design :-)

A test case would be nice, then I'll fix it as soon as possible. But
please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
fix to ll_rw_blk that can easily cause this. The first report is for
2.6.8.1, so I'm more puzzled on that.

-- 
Jens Axboe

next prev parent reply	other threads:[~2004-08-23 15:49 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-22 13:27 Kernel 2.6.8.1: swap storm of death Karl Vogel
2004-08-22 13:33 ` Karl Vogel
2004-08-22 18:49 ` Kernel 2.6.8.1: swap storm of death - 2.6.8.1-mm4 also karl.vogel
2004-08-22 19:18   ` Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
2004-08-23 14:12     ` Marcelo Tosatti
2004-08-23 15:41       ` Jens Axboe [this message]
  -- strict thread matches above, loose matches on Subject: below --
2004-08-23 16:10 Karl Vogel
2004-08-23 17:00 ` Jens Axboe
2004-08-24 10:03 ` Jens Axboe
2004-08-24  9:18   ` Marcelo Tosatti
2004-08-24 10:52     ` Jens Axboe
2004-08-24 10:13   ` Jens Axboe
2004-08-24 10:28 Karl Vogel
2004-08-24 10:29 ` Jens Axboe
2004-08-24 10:35 Karl Vogel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040823154113.GZ2301@suse.de \
    --to=axboe@suse.de \
    --cc=karl.vogel@seagha.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo.tosatti@cyclades.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox