From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933612AbcECPcm (ORCPT ); Tue, 3 May 2016 11:32:42 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:34406 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932867AbcECPcj (ORCPT ); Tue, 3 May 2016 11:32:39 -0400 Subject: Re: [PATCH 7/8] wbt: add general throttling mechanism To: Jan Kara References: <1461686131-22999-1-git-send-email-axboe@fb.com> <1461686131-22999-8-git-send-email-axboe@fb.com> <20160428110559.GC17362@quack2.suse.cz> <57225C3E.7060504@fb.com> <20160503093410.GD12748@quack2.suse.cz> <5728B45F.6050200@fb.com> <20160503152249.GF25436@quack2.suse.cz> CC: , , , , From: Jens Axboe Message-ID: <5728C48F.9010102@fb.com> Date: Tue, 3 May 2016 09:32:31 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: <20160503152249.GF25436@quack2.suse.cz> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [192.168.54.13] X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-05-03_06:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/03/2016 09:22 AM, Jan Kara wrote: > On Tue 03-05-16 08:23:27, Jens Axboe wrote: >> On 05/03/2016 03:34 AM, Jan Kara wrote: >>> On Thu 28-04-16 12:53:50, Jens Axboe wrote: >>>>> 2) As far as I can see in patch 8/8, you have plugged the throttling above >>>>> the IO scheduler. When there are e.g. multiple cgroups with different IO >>>>> limits operating, this throttling can lead to strange results (like a >>>>> cgroup with low limit using up all available background "slots" and thus >>>>> effectively stopping background writeback for other cgroups)? So won't >>>>> it make more sense to plug this below the IO scheduler? Now I understand >>>>> there may be other problems with this but I think we should put more >>>>> though to that and provide some justification in changelogs. >>>> >>>> One complexity is that we have to do this early for blk-mq, since once you >>>> get a request, you're already sitting on the hw tag. CoDel should actually >>>> work fine at each hop, so hopefully this will as well. >>> >>> OK, I see. But then this suggests that any IO scheduling and / or >>> cgroup-related throttling should happen before we get a request for blk-mq >>> as well? And then we can still do writeback throttling below that layer? >> >> Not necessarily. For IO scheduling, basically we care about two parts: >> >> 1) Are you allowed to allocate the resources to queue some IO >> 2) Are you allowed to dispatch > > But then it seems suboptimal to waste a relatively scarce resource (which > HW tag is AFAIU) just because you happen to run from a cgroup that is > bandwidth limited and thus are not allowed to dispatch? For some cases, you are absolutely right, and #1 is the main one. For your case of QD=1, that's obviously the case. For SATA, it's a bit more grey zone, and for others (nvme, scsi, etc), it's not really a scarce resource so #2 is the bigger part of it. -- Jens Axboe