Linux cgroups development
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Tejun Heo <tj@kernel.org>
Cc: "Jan Kara" <jack@suse.cz>, "Michal Koutný" <mkoutny@suse.com>,
	"Jinke Han" <hanjinke.666@bytedance.com>,
	josef@toxicpanda.com, axboe@kernel.dk, cgroups@vger.kernel.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	yinxin.x@bytedance.com
Subject: Re: [PATCH v3] blk-throtl: Introduce sync and async queues for blk-throtl
Date: Mon, 9 Jan 2023 11:59:16 +0100	[thread overview]
Message-ID: <20230109105916.jvnhjdseqkwejmws@quack3> (raw)
In-Reply-To: <Y7hTHZQYsCX6EHIN@slm.duckdns.org>

Hello!

On Fri 06-01-23 06:58:05, Tejun Heo wrote:
> Hello,
> 
> On Fri, Jan 06, 2023 at 04:38:13PM +0100, Jan Kara wrote:
> > Generally, problems are this are taken care of by IO schedulers. E.g. BFQ
> > has quite a lot of logic exactly to reduce problems like this. Sync and
> > async queues are one part of this logic inside BFQ (but there's more).
> 
> With modern ssd's, even deadline's overhead is too high and a lot (but
> clearly not all) of what the IO schedulers do are no longer necessary. I
> don't see a good way back to elevators.

Yeah, I agree there's no way back :). But actually I think a lot of the
functionality of IO schedulers is not needed (by you ;)) only because the
HW got performant enough and so some issues became less visible. And that
is all fine but if you end up in a configuration where your cgroup's IO
limits and IO demands are similar to how the old rotational disks were
underprovisioned for the amount of IO needed to be done by the system
(i.e., you can easily generate amount of IO that then takes minutes or tens
of minutes for your IO subsystem to crunch through), you hit all the same
problems IO schedulers were trying to solve again. And maybe these days we
incline more towards the answer "buy more appropriate HW / buy higher
limits from your infrastructure provider" but it is not like the original
issues in such configurations disappeared.

> > But given current architecture of the block layer IO schedulers are below
> > throttling frameworks such as blk-throtl so they have no chance of
> > influencing problems like this. So we are bound to reinvent the scheduling
> > logic IO schedulers are already doing. That being said I don't have a good
> > solution for this or architecture suggestion. Because implementing various
> > throttling frameworks within IO schedulers is cumbersome (complex
> > interactions) and generally the perfomance is too slow for some usecases.
> > We've been there (that's why there's cgroup support in BFQ) and really
> > the current architecture is much easier to reason about.
> 
> Another layering problem w/ controlling from elevators is that that's after
> request allocation and the issuer has already moved on. We used to have
> per-cgroup rq pools but ripped that out, so it's pretty easy to cause severe
> priority inversions by depleting the shared request pool, and the fact that
> throttling takes place after the issuing task returned from issue path makes
> propagating the throttling operation upwards more challenging too.

Well, we do have .limit_depth IO scheduler callback these days so BFQ uses
that to solve the problem of exhaustion of shared request pool but I agree
it's a bit of a hack on the side.

> At least in terms of cgroup control, the new bio based behavior is a lot
> better. In the fb fleet, iocost is deployed on most (virtually all) of the
> machines and we don't see issues with severe priority inversions.
> Cross-cgroup control is pretty well controlled. Inside each cgroup, sync
> writes aren't prioritized but nobody seems to be troubled by that.
> 
> My bet is that inversion issues are a lot more severe with blk-throttle
> because it's not work-conserving and not doing things like issue-as-root or
> other measures to alleviate issues which can arise from inversions.

Yes, I agree these features of blk-throttle make the problems much more
likely to happen in practice.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  parent reply	other threads:[~2023-01-09 10:59 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-26 13:05 [PATCH v3] blk-throtl: Introduce sync and async queues for blk-throtl Jinke Han
2022-12-26 15:24 ` kernel test robot
     [not found] ` <20221226130505.7186-1-hanjinke.666-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
2023-01-04 22:11   ` Tejun Heo
     [not found]     ` <Y7X5rsnYCAAYRGQd-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2023-01-05  7:28       ` [External] " hanjinke
2023-01-05 16:18   ` Michal Koutný
     [not found]     ` <20230105161854.GA1259-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2023-01-05 17:35       ` Tejun Heo
     [not found]         ` <Y7cKf7IH+FJ/6IyV-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2023-01-05 19:22           ` Michal Koutný
     [not found]             ` <20230105192247.GB16920-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2023-01-05 21:39               ` Tejun Heo
2023-01-06 15:38       ` Jan Kara
2023-01-06 16:58         ` Tejun Heo
     [not found]           ` <Y7hTHZQYsCX6EHIN-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2023-01-06 18:07             ` [External] " hanjinke
     [not found]               ` <c839ba6c-80ac-6d92-af64-5c0e1956ae93-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
2023-01-06 18:15                 ` Tejun Heo
2023-01-07  4:44                   ` hanjinke
     [not found]                     ` <e499f088-8ed9-2e19-b2e5-efaa4f9738f0-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>
2023-01-09 18:08                       ` Tejun Heo
2023-01-10 13:07                         ` hanjinke
2023-01-11 12:35               ` Michal Koutný
     [not found]                 ` <20230111123532.GB3673-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org>
2023-01-12  3:26                   ` hanjinke
2023-01-09 10:59           ` Jan Kara [this message]
2023-01-09 17:10             ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230109105916.jvnhjdseqkwejmws@quack3 \
    --to=jack@suse.cz \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=hanjinke.666@bytedance.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkoutny@suse.com \
    --cc=tj@kernel.org \
    --cc=yinxin.x@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox