public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Shaohua Li <shli@kernel.org>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>, Jens Axboe <axboe@kernel.dk>
Subject: Re: Read starvation by sync writes
Date: Thu, 13 Dec 2012 11:32:59 +0100	[thread overview]
Message-ID: <20121213103259.GA18843@quack.suse.cz> (raw)
In-Reply-To: <CANejiEU=vcpofMViAH609fH_fh=eTn8v3_06boVbQSbvs8DOTw@mail.gmail.com>

On Thu 13-12-12 09:43:31, Shaohua Li wrote:
> 2012/12/12 Jan Kara <jack@suse.cz>:
> > On Wed 12-12-12 10:55:15, Shaohua Li wrote:
> >> 2012/12/11 Jan Kara <jack@suse.cz>:
> >> >   Hi,
> >> >
> >> >   I was looking into IO starvation problems where streaming sync writes (in
> >> > my case from kjournald but DIO would look the same) starve reads. This is
> >> > because reads happen in small chunks and until a request completes we don't
> >> > start reading further (reader reads lots of small files) while writers have
> >> > plenty of big requests to submit. Both processes end up fighting for IO
> >> > requests and writer writes nr_batching 512 KB requests while reader reads
> >> > just one 4 KB request or so. Here the effect is magnified by the fact that
> >> > the drive has relatively big queue depth so it usually takes longer than
> >> > BLK_BATCH_TIME to complete the read request. The net result is it takes
> >> > close to two minutes to read files that can be read under a second without
> >> > writer load. Without the big drive's queue depth, results are not ideal but
> >> > they are bearable - it takes about 20 seconds to do the reading. And for
> >> > comparison, when writer and reader are not competing for IO requests (as it
> >> > happens when writes are submitted as async), it takes about 2 seconds to
> >> > complete reading.
> >> >
> >> > Simple reproducer is:
> >> >
> >> > echo 3 >/proc/sys/vm/drop_caches
> >> > dd if=/dev/zero of=/tmp/f bs=1M count=10000 &
> >> > sleep 30
> >> > time cat /etc/* 2>&1 >/dev/null
> >> > killall dd
> >> > rm /tmp/f
> >> >
> >> >   The question is how can we fix this? Two quick hacks that come to my mind
> >> > are remove timeout from the batching logic (is it that important?) or
> >> > further separate request allocation logic so that reads have their own
> >> > request pool. More systematic fix would be to change request allocation
> >> > logic to always allow at least a fixed number of requests per IOC. What do
> >> > people think about this?
> >>
> >> As long as queue depth > workload iodepth, there is little we can do
> >> to prioritize tasks/IOC. Because throttling a task/IOC means queue
> >> will be idle. We don't want to idle a queue (especially for SSD), so
> >> we always push as more requests as possible to the queue, which
> >> will break any prioritization. As far as I know we always have such
> >> issue in CFQ for big queue depth disk.
> >   Yes, I understand that. But actually big queue depth on its own doesn't
> > make the problem really bad (at least for me). When the reader doesn't have
> > to wait for free IO requests, it progresses at a reasonable speed. What
> > makes it really bad is that big queue depth effectively disallows any use
> > of ioc_batching() mode for the reader and thus it blocks in request
> > allocation for every single read request unlike writer which always uses
> > its full batch (32 requests).
> 
> This can't explain why setting queue depth 1 makes the performance
> better.
  It does, when queue depth is small, reads are completed faster so reader
is able to submit more reads during one ioc_batching() period.

> In that case, write still get that number of requests, read will
> wait for a request. Anyway, try setting nr_request to a big number
> and check if performance is different.
  I have checked. Setting nr_requests to 100000 makes reader proceed at a
reasonable speed.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

      reply	other threads:[~2012-12-13 10:33 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-10 22:12 Read starvation by sync writes Jan Kara
2012-12-11 21:44 ` Jeff Moyer
2012-12-12  2:31   ` Jan Kara
2012-12-12  4:18     ` Dave Chinner
2012-12-12 10:26       ` Jan Kara
2012-12-12 23:33         ` Dave Chinner
2012-12-12  0:13 ` Jan Engelhardt
2012-12-12  2:55 ` Shaohua Li
2012-12-12 10:11   ` Jan Kara
2012-12-12 15:19     ` Jens Axboe
2012-12-12 16:38       ` Jeff Moyer
2012-12-12 19:41         ` Jeff Moyer
2012-12-13 12:30           ` Jan Kara
2012-12-13 13:30           ` Jens Axboe
2012-12-13 14:55             ` Jeff Moyer
2012-12-13 15:02             ` Jan Kara
2012-12-13 18:03               ` Jens Axboe
2013-01-23 17:35                 ` Jeff Moyer
2012-12-13  1:43     ` Shaohua Li
2012-12-13 10:32       ` Jan Kara [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121213103259.GA18843@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=axboe@kernel.dk \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox