From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>,
linux-fsdevel@vger.kernel.org,
LKML <linux-kernel@vger.kernel.org>, Jens Axboe <axboe@kernel.dk>
Subject: Re: Read starvation by sync writes
Date: Thu, 13 Dec 2012 10:33:07 +1100 [thread overview]
Message-ID: <20121212233307.GA16353@dastard> (raw)
In-Reply-To: <20121212102617.GD18885@quack.suse.cz>
On Wed, Dec 12, 2012 at 11:26:17AM +0100, Jan Kara wrote:
> On Wed 12-12-12 15:18:21, Dave Chinner wrote:
> > On Wed, Dec 12, 2012 at 03:31:37AM +0100, Jan Kara wrote:
> > > On Tue 11-12-12 16:44:15, Jeff Moyer wrote:
> > > > Jan Kara <jack@suse.cz> writes:
> > > >
> > > > > Hi,
> > > > >
> > > > > I was looking into IO starvation problems where streaming sync writes (in
> > > > > my case from kjournald but DIO would look the same) starve reads. This is
> > > > > because reads happen in small chunks and until a request completes we don't
> > > > > start reading further (reader reads lots of small files) while writers have
> > > > > plenty of big requests to submit. Both processes end up fighting for IO
> > > > > requests and writer writes nr_batching 512 KB requests while reader reads
> > > > > just one 4 KB request or so. Here the effect is magnified by the fact that
> > > > > the drive has relatively big queue depth so it usually takes longer than
> > > > > BLK_BATCH_TIME to complete the read request. The net result is it takes
> > > > > close to two minutes to read files that can be read under a second without
> > > > > writer load. Without the big drive's queue depth, results are not ideal but
> > > > > they are bearable - it takes about 20 seconds to do the reading. And for
> > > > > comparison, when writer and reader are not competing for IO requests (as it
> > > > > happens when writes are submitted as async), it takes about 2 seconds to
> > > > > complete reading.
> > > > >
> > > > > Simple reproducer is:
> > > > >
> > > > > echo 3 >/proc/sys/vm/drop_caches
> > > > > dd if=/dev/zero of=/tmp/f bs=1M count=10000 &
> > > > > sleep 30
> > > > > time cat /etc/* 2>&1 >/dev/null
> > > > > killall dd
> > > > > rm /tmp/f
> > > >
> > > > This is a buffered writer. How does it end up that you are doing all
> > > > synchronous write I/O? Also, you forgot to mention what file system you
> > > > were using, and which I/O scheduler.
> > > So IO scheduler is CFQ, filesystem is ext3 - which is the culprit why IO
> > > ends up being synchronous - in ext3 in data=ordered mode kjournald often ends
> > > up submitting all the data to disk and it can do it as WRITE_SYNC if someone is
> > > waiting for transaction commit. In theory this can happen with AIO DIO
> > > writes or someone running fsync on a big file as well. Although when I
> > > tried this now, I wasn't able to create as big problem as kjournald does
> > > (a kernel thread submitting huge linked list of buffer heads in a tight loop
> > > is hard to beat ;). Hum, so maybe just adding some workaround in kjournald
> > > so that it's not as aggressive will solve the real world cases as well...
> >
> > Maybe kjournald shouldn't be using WRITE_SYNC for those buffers? I
> > mean, if there is that many of them then it's really a batch
> > submission an dthe latency of a single buffer IO is really
> > irrelevant to the rate at which the buffers are flushed to disk....
> Yeah, the idea why kjournald uses WRITE_SYNC is that we know someone is
> waiting for transaction commit and that's pretty much definition of what
> WRITE_SYNC means.
Well, XFS only uses WRITE_SYNC for WB_SYNC_ALL writeback, which
means only when a user is waiting on the wdata writeback will it use
WRITE_SYNC. I'm really not sure what category journal flushes fall
into, because XFS doesn't do data writeback from journal flushes....
> Hum, maybe if DIO wasn't using WRITE_SYNC (one could make similar
> argument there as with kjournald). But then the definition of what
> WRITE_SYNC should mean starts to be pretty foggy.
DIO used WRITE_ODIRECT, not WRITE_SYNC. The difference is that
WRITE_SYNC sets REQ_NOIDLE, so DIO is actually different to
WRITE_SYNC behaviour for CFQ...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2012-12-12 23:33 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-10 22:12 Read starvation by sync writes Jan Kara
2012-12-11 21:44 ` Jeff Moyer
2012-12-12 2:31 ` Jan Kara
2012-12-12 4:18 ` Dave Chinner
2012-12-12 10:26 ` Jan Kara
2012-12-12 23:33 ` Dave Chinner [this message]
2012-12-12 0:13 ` Jan Engelhardt
2012-12-12 2:55 ` Shaohua Li
2012-12-12 10:11 ` Jan Kara
2012-12-12 15:19 ` Jens Axboe
2012-12-12 16:38 ` Jeff Moyer
2012-12-12 19:41 ` Jeff Moyer
2012-12-13 12:30 ` Jan Kara
2012-12-13 13:30 ` Jens Axboe
2012-12-13 14:55 ` Jeff Moyer
2012-12-13 15:02 ` Jan Kara
2012-12-13 18:03 ` Jens Axboe
2013-01-23 17:35 ` Jeff Moyer
2012-12-13 1:43 ` Shaohua Li
2012-12-13 10:32 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121212233307.GA16353@dastard \
--to=david@fromorbit.com \
--cc=axboe@kernel.dk \
--cc=jack@suse.cz \
--cc=jmoyer@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.