Re: Read starvation by sync writes

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Jens Axboe <axboe@kernel.dk>
Cc: Jeff Moyer <jmoyer@redhat.com>, Jan Kara <jack@suse.cz>,
	Shaohua Li <shli@kernel.org>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: Read starvation by sync writes
Date: Thu, 13 Dec 2012 16:02:55 +0100	[thread overview]
Message-ID: <20121213150255.GA21592@quack.suse.cz> (raw)
In-Reply-To: <50C9D882.7000808@kernel.dk>

On Thu 13-12-12 14:30:42, Jens Axboe wrote:
> On 2012-12-12 20:41, Jeff Moyer wrote:
> > Jeff Moyer <jmoyer@redhat.com> writes:
> > 
> >>> I agree. This isn't about scheduling, we haven't even reached that part
> >>> yet. Back when we split the queues into read vs write, this problem
> >>> obviously wasn't there. Now we have sync writes and reads, both eating
> >>> from the same pool. The io scheduler can impact this a bit by forcing
> >>> reads to must allocate (Jan, which io scheduler are you using?). CFQ
> >>> does this when it's expecting a request from this process queue.
> >>>
> >>> Back in the day, we used to have one list. To avoid a similar problem,
> >>> we reserved the top of the list for reads. With the batching, it's a bit
> >>> more complicated. If we make the request allocation (just that, not the
> >>> scheduling) be read vs write instead of sync vs async, then we have the
> >>> same issue for sync vs buffered writes.
> >>>
> >>> How about something like the below? Due to the nature of sync reads, we
> >>> should allow a much longer timeout. The batch is really tailored towards
> >>> writes at the moment. Also shrink the batch count, 32 is pretty large...
> >>
> >> Does batching even make sense for dependent reads?  I don't think it
> >> does.
> > 
> > Having just read the batching code in detail, I'd like to ammend this
> > misguided comment.  Batching logic kicks in when you happen to be lucky
> > enough to use up the last request.  As such, I'd be surprised if the
> > patch you posted helped.  Jens, don't you think the writer is way more
> > likely to become the batcher?  I do agree with shrinking the batch count
> > to 16, whether or not the rest of the patch goes in.
> > 
> >>  Assuming you disagree, then you'll have to justify that fixed
> >> time value of 2 seconds.  The amount of time between dependent reads
> >> will vary depending on other I/O sent to the device, the properties of
> >> the device, the I/O scheduler, and so on.  If you do stick 2 seconds in
> >> there, please comment it.  Maybe it's time we started keeping track of
> >> worst case Q->C time?  That could be used to tell worst case latency,
> >> and adjust magic timeouts like this one.
> >>
> >> I'm still thinking about how we might solve this in a cleaner way.
> > 
> > The way things stand today, you can do a complete end run around the I/O
> > scheduler by queueing up enough I/O.  To address that, I think we need
> > to move to a request list per io_context as Jan had suggested.  That
> > way, we can keep the logic about who gets to submit I/O when in one
> > place.
> > 
> > Jens, what do you think?
> 
> I think that is pretty extreme. We have way too much accounting around
> this already, and I'd rather just limit the batching than make
> per-ioc request lists too.
> 
> I agree the batch addition isn't super useful for the reads. It really
> is mostly a writer thing, and the timing reflects that.
> 
> The problem is really that the WRITE_SYNC is (for Jan's case) behaving
> like buffered writes, so it eats up a queue of requests very easily. On
> the allocation side, the assumption is that WRITE_SYNC behaves like
> dependent reads. Similar to a dd with oflag=direct, not like a flood of
> requests. For dependent sync writes, our current behaviour is fine, we
> treat them like reads. For commits of WRITE_SYNC, they should be treated
> like async WRITE instead.
  Yeah. But it's similar to what happens when you run fsync() on a large
dirty file. That will also submit a lot of WRITE_SYNC requests... kjournald
could probably use WRITE instead of WRITE_SYNC for large commits. It's just
that we don't really want to give e.g. DIO a preference over kjournald
because transaction commit can effectively block any metadata changes on
the filesystem.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

next prev parent reply	other threads:[~2012-12-13 15:03 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-10 22:12 Read starvation by sync writes Jan Kara
2012-12-11 21:44 ` Jeff Moyer
2012-12-12  2:31   ` Jan Kara
2012-12-12  4:18     ` Dave Chinner
2012-12-12 10:26       ` Jan Kara
2012-12-12 23:33         ` Dave Chinner
2012-12-12  0:13 ` Jan Engelhardt
2012-12-12  2:55 ` Shaohua Li
2012-12-12 10:11   ` Jan Kara
2012-12-12 15:19     ` Jens Axboe
2012-12-12 16:38       ` Jeff Moyer
2012-12-12 19:41         ` Jeff Moyer
2012-12-13 12:30           ` Jan Kara
2012-12-13 13:30           ` Jens Axboe
2012-12-13 14:55             ` Jeff Moyer
2012-12-13 15:02             ` Jan Kara [this message]
2012-12-13 18:03               ` Jens Axboe
2013-01-23 17:35                 ` Jeff Moyer
2012-12-13  1:43     ` Shaohua Li
2012-12-13 10:32       ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121213150255.GA21592@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=axboe@kernel.dk \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox