public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: LKML <linux-kernel@vger.kernel.org>
Cc: vgoyal@redhat.com, jmoyer@redhat.com, jaxboe@fusionio.com,
	Lennart Poettering <lennart@poettering.net>
Subject: Request starvation with CFQ
Date: Mon, 27 Sep 2010 21:00:24 +0200	[thread overview]
Message-ID: <20100927190024.GF3610@quack.suse.cz> (raw)

  Hi,

  when helping Lennart with answering some questions, I've spotted the
following problem (at least I think it's a problem ;): The thing is that
CFQ schedules how requests should be dispatched but does not in any
significant way limit to whom requests get allocated. Given we have a
quite limited pool of available requests it can happen that processes
will be actually starved not waiting for disk but waiting for requests
getting allocated and any IO scheduling priorities or classes will not
have serious effect.
  A pathological example I've tried below:
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>

int main(void)
{
  int fd = open("/dev/vdb", O_RDONLY);
  int loop = 0;

  if (fd < 0) {
    perror("open");
    exit(1);
  }
  while (1) {
    if (loop % 100 == 0)
      printf("Loop %d\n", loop);
    posix_fadvise(fd, (random() * 4096) % 1000204886016ULL, 4096, POSIX_FADV_WILLNEED);
    loop++;
  }
}

  This program will just push as many requests as possible to the block
layer and does not wait for any IO. Thus it will basically ignore any
decisions about when requests get dispatched. BTW, don't get distracted
by the fact that the program operates directly on the device, that is just
for simplicity. Large enough file would work the same way.
  Even though I run this program with ionice -c 3, I still see that any
other IO to the device is basically stalled. When I look at the block
traces, I indeed see that what happens is that the above program submits
requests until there are no more available:
...
254,16   2      802     1.411285520  2563  Q   R 696733184 + 8 [random_read]
254,16   2      803     1.411314880  2563  G   R 696733184 + 8 [random_read]
254,16   2      804     1.411338220  2563  I   R 696733184 + 8 [random_read]
254,16   2      805     1.411415040  2563  Q   R 1006864600 + 8 [random_read]
254,16   2      806     1.411441620  2563  S   R 1006864600 + 8 [random_read]

during and after that IO happens:
254,16   3       31     1.417898030     0  C   R 345134640 + 8 [0]
254,16   3       32     1.418171910     0  D   R 1524771568 + 8 [swapper]
254,16   0       33     1.432317140     0  C   R 1524771568 + 8 [0]
254,16   0       34     1.432597000     0  D   R 1077270768 + 8 [swapper]
...
254,16   0       35     1.503238050     0  C   R 33633744 + 8 [0]
254,16   0       36     1.503558290     0  D   R 22178968 + 8 [swapper]

and the other program comes with IO and gets stalled:
254,16   1       39     1.508843180  2564  A  RM 12346 + 8 <- (254,17) 12312
254,16   1       40     1.508876520  2564  Q  RM 12346 + 8 [ls]
254,16   1       41     1.508905140  2564  S  RM 12346 + 8 [ls]
...
IO is still running:
254,16   2      807     1.512081560     0  C   R 22178968 + 8 [0]
254,16   2      808     1.512365010     0  D   R 475025688 + 8 [swapper]
254,16   3       35     1.522113270     0  C   R 475025688 + 8 [0]
254,16   3       36     1.522390779     0  D   R 697010128 + 8 [swapper]
254,16   4       33     1.531443760     0  C   R 697010128 + 8 [0]
...
random reader even gets to submitting more requests:
254,16   2      815     1.785734950  2563  G   R 1006864600 + 8 [random_read]
254,16   2      816     1.785752290  2563  I   R 1006864600 + 8 [random_read]
254,16   2      817     1.785825880  2563  Q   R 832683552 + 8 [random_read]
254,16   2      818     1.785850890  2563  G   R 832683552 + 8 [random_read]
254,16   2      819     1.785874610  2563  I   R 832683552 + 8 [random_read]
...
and finally our program gets to adding it's request as well:
254,16   1       60     2.160884040  2564  G  RM 12346 + 8 [ls]
254,16   1       61     2.160914700  2564  I   R 12346 + 8 [ls]
254,16   1       62     2.161142170  2564  D   R 12346 + 8 [ls]
254,16   1       63     2.161233670  2564  U   N [ls] 128

  I can provide the full traces for download if someone is interested
in some part I didn't include here. The kernel is 2.6.36-rc4.
  Now I agree that the above program is about as bad as it can get but
Lennart would like to implement readahead during boot on background and
I believe that could starve other IO in a similar way. So any idea how
to solve this? To me it seems as if we also needed to somehow limit the
number of allocated requests per cfqq but OTOH we have to be really careful
to not harm common workloads where we benefit from having lots of requests
queued...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

             reply	other threads:[~2010-09-27 19:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-27 19:00 Jan Kara [this message]
2010-09-27 19:17 ` Request starvation with CFQ N.P.S. N.P.S.
2010-09-27 20:02 ` Vivek Goyal
2010-09-27 22:04   ` Jens Axboe
2010-09-27 22:35     ` Jan Kara
2010-09-27 22:41       ` Jens Axboe
2010-09-27 22:37     ` Vivek Goyal
2010-09-27 22:47       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100927190024.GF3610@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=jaxboe@fusionio.com \
    --cc=jmoyer@redhat.com \
    --cc=lennart@poettering.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox