From: Jan Kara <jack@suse.cz>
To: Jens Axboe <axboe@fb.com>
Cc: Jan Kara <jack@suse.cz>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-block@vger.kernel.org, dchinner@redhat.com,
sedat.dilek@gmail.com
Subject: Re: [PATCH 7/8] wbt: add general throttling mechanism
Date: Tue, 3 May 2016 11:34:10 +0200 [thread overview]
Message-ID: <20160503093410.GD12748@quack2.suse.cz> (raw)
In-Reply-To: <57225C3E.7060504@fb.com>
On Thu 28-04-16 12:53:50, Jens Axboe wrote:
> >2) As far as I can see in patch 8/8, you have plugged the throttling above
> > the IO scheduler. When there are e.g. multiple cgroups with different IO
> > limits operating, this throttling can lead to strange results (like a
> > cgroup with low limit using up all available background "slots" and thus
> > effectively stopping background writeback for other cgroups)? So won't
> > it make more sense to plug this below the IO scheduler? Now I understand
> > there may be other problems with this but I think we should put more
> > though to that and provide some justification in changelogs.
>
> One complexity is that we have to do this early for blk-mq, since once you
> get a request, you're already sitting on the hw tag. CoDel should actually
> work fine at each hop, so hopefully this will as well.
OK, I see. But then this suggests that any IO scheduling and / or
cgroup-related throttling should happen before we get a request for blk-mq
as well? And then we can still do writeback throttling below that layer?
> But yes, fairness is something that we have to pay attention to. Right now
> the wait queue has no priority associated with it, that should probably be
> improved to be able to wakeup in a more appropriate order.
> Needs testing, but hopefully it works out since if you do run into
> starvation, then you'll go to the back of the queue for the next attempt.
Yeah, once I'll hunt down that regression with old disk, I can have a look
into how writeback throttling plays together with blkio-controller.
> >>+static int __latency_exceeded(struct rq_wb *rwb, struct blk_rq_stat *stat)
> >>+{
> >>+ u64 thislat;
> >>+
> >>+ /*
> >>+ * If our stored sync issue exceeds the window size, or it
> >>+ * exceeds our min target AND we haven't logged any entries,
> >>+ * flag the latency as exceeded.
> >>+ */
> >>+ thislat = rwb_sync_issue_lat(rwb);
> >>+ if (thislat > rwb->cur_win_nsec ||
> >>+ (thislat > rwb->min_lat_nsec && !stat[0].nr_samples)) {
> >>+ trace_wbt_lat(rwb->bdi, thislat);
> >>+ return LAT_EXCEEDED;
> >>+ }
> >
> >So I'm trying to wrap my head around this. If I read the code right,
> >rwb_sync_issue_lat() this returns time that has passed since issuing sync
> >request that is still running. We basically randomly pick which sync
> >request we track as we always start tracking a sync request when some is
> >issued and we are not tracking any at that moment. This is to detect the
> >case when latency of sync IO is very large compared to measurement window
> >so we would not get enough samples to make it valid?
>
> Right, that's pretty close. Since wbt uses the completion latencies to make
> decisions, if an IO hasn't completed, we don't know about it. If the device
> is flooded with writes, and we then issue a read, maybe that read won't
> complete for multiple monitoring windows. During that time, we keep thinking
> everything is fine. But in reality, it's not completing because of the write
> load. So this logic attempts to track the single sync IO request case. If
> that exceeds a monitoring window of time and we saw no other sync IO in that
> window, then treat that case as if it had completed but exceeded the min
> latency. And then scale back.
>
> We'll always treat a state sample with 1 read as valuable, but for this
> case, we don't have that sample until it completes.
>
> Does that make more sense?
OK, makes sense. Thanks for explanation.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2016-05-03 9:34 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-26 15:55 [PATCHSET v5] Make background writeback great again for the first time Jens Axboe
2016-04-26 15:55 ` [PATCH 1/8] block: add WRITE_BG Jens Axboe
2016-04-26 15:55 ` [PATCH 2/8] writeback: add wbc_to_write_cmd() Jens Axboe
2016-04-26 15:55 ` [PATCH 3/8] writeback: use WRITE_BG for kupdate and background writeback Jens Axboe
2016-04-26 15:55 ` [PATCH 4/8] writeback: track if we're sleeping on progress in balance_dirty_pages() Jens Axboe
2016-04-26 15:55 ` [PATCH 5/8] block: add code to track actual device queue depth Jens Axboe
2016-04-26 15:55 ` [PATCH 6/8] block: add scalable completion tracking of requests Jens Axboe
2016-05-05 7:52 ` Ming Lei
2016-04-26 15:55 ` [PATCH 7/8] wbt: add general throttling mechanism Jens Axboe
2016-04-27 12:06 ` xiakaixu
2016-04-27 15:21 ` Jens Axboe
2016-04-28 3:29 ` xiakaixu
2016-04-28 11:05 ` Jan Kara
2016-04-28 18:53 ` Jens Axboe
2016-04-28 19:03 ` Jens Axboe
2016-05-03 9:34 ` Jan Kara [this message]
2016-05-03 14:23 ` Jens Axboe
2016-05-03 15:22 ` Jan Kara
2016-05-03 15:32 ` Jens Axboe
2016-05-03 15:40 ` Jan Kara
2016-05-03 15:48 ` Jan Kara
2016-05-03 16:59 ` Jens Axboe
2016-05-03 18:14 ` Jens Axboe
2016-05-03 19:07 ` Jens Axboe
2016-04-26 15:55 ` [PATCH 8/8] writeback: throttle buffered writeback Jens Axboe
2016-04-27 18:01 ` [PATCHSET v5] Make background writeback great again for the first time Jan Kara
2016-04-27 18:17 ` Jens Axboe
2016-04-27 20:37 ` Jens Axboe
2016-04-27 20:59 ` Jens Axboe
2016-04-28 4:06 ` xiakaixu
2016-04-28 18:36 ` Jens Axboe
2016-04-28 11:54 ` Jan Kara
2016-04-28 18:46 ` Jens Axboe
2016-05-03 12:17 ` Jan Kara
2016-05-03 12:40 ` Chris Mason
2016-05-03 13:06 ` Jan Kara
2016-05-03 13:42 ` Chris Mason
2016-05-03 13:57 ` Jan Kara
2016-05-11 16:36 ` Jan Kara
2016-05-13 18:29 ` Jens Axboe
2016-05-16 7:47 ` Jan Kara
-- strict thread matches above, loose matches on Subject: below --
2016-08-31 17:05 [PATCHSET v6] Throttled background buffered writeback Jens Axboe
2016-08-31 17:05 ` [PATCH 7/8] wbt: add general throttling mechanism Jens Axboe
2016-09-01 18:05 ` Omar Sandoval
2016-09-01 18:51 ` Jens Axboe
2016-09-07 14:46 [PATCH 0/8] Throttled background buffered writeback v7 Jens Axboe
2016-09-07 14:46 ` [PATCH 7/8] wbt: add general throttling mechanism Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160503093410.GD12748@quack2.suse.cz \
--to=jack@suse.cz \
--cc=axboe@fb.com \
--cc=dchinner@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sedat.dilek@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).