From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [PATCH 7/8] wbt: add general throttling mechanism To: Jan Kara , Jens Axboe References: <1461686131-22999-1-git-send-email-axboe@fb.com> <1461686131-22999-8-git-send-email-axboe@fb.com> <20160428110559.GC17362@quack2.suse.cz> <57225C3E.7060504@fb.com> <20160503093410.GD12748@quack2.suse.cz> <20160503154032.GG25436@quack2.suse.cz> <20160503154831.GH25436@quack2.suse.cz> <5728D90F.8080204@kernel.dk> <5728EA8A.9040405@kernel.dk> Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, dchinner@redhat.com, sedat.dilek@gmail.com From: Jens Axboe Message-ID: <5728F6F5.5050701@kernel.dk> Date: Tue, 3 May 2016 13:07:33 -0600 MIME-Version: 1.0 In-Reply-To: <5728EA8A.9040405@kernel.dk> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: On 05/03/2016 12:14 PM, Jens Axboe wrote: > On 05/03/2016 10:59 AM, Jens Axboe wrote: >> On 05/03/2016 09:48 AM, Jan Kara wrote: >>> On Tue 03-05-16 17:40:32, Jan Kara wrote: >>>> On Tue 03-05-16 11:34:10, Jan Kara wrote: >>>>> Yeah, once I'll hunt down that regression with old disk, I can have >>>>> a look >>>>> into how writeback throttling plays together with blkio-controller. >>>> >>>> So I've tried the following script (note that you need cgroup v2 for >>>> writeback IO to be throttled): >>>> >>>> --- >>>> mkdir /sys/fs/cgroup/group1 >>>> echo 1000 >/sys/fs/cgroup/group1/io.weight >>>> dd if=/dev/zero of=/mnt/file1 bs=1M count=10000& >>>> DD1=$! >>>> echo $DD1 >/sys/fs/cgroup/group1/cgroup.procs >>>> >>>> mkdir /sys/fs/cgroup/group2 >>>> echo 100 >/sys/fs/cgroup/group2/io.weight >>>> #echo "259:65536 wbps=5000000" >/sys/fs/cgroup/group2/io.max >>>> echo "259:65536 wbps=max" >/sys/fs/cgroup/group2/io.max >>>> dd if=/dev/zero of=/mnt/file2 bs=1M count=10000& >>>> DD2=$! >>>> echo $DD2 >/sys/fs/cgroup/group2/cgroup.procs >>>> >>>> while true; do >>>> sleep 1 >>>> kill -USR1 $DD1 >>>> kill -USR1 $DD2 >>>> echo >>>> '=======================================================' >>>> done >>>> --- >>>> >>>> and watched the progress of the dd processes in different cgroups. >>>> The 1/10 >>>> weight difference has no effect with your writeback patches - the >>>> situation >>>> after one minute: >>>> >>>> 3120+1 records in >>>> 3120+1 records out >>>> 3272392704 bytes (3.3 GB) copied, 63.7119 s, 51.4 MB/s >>>> 3217+1 records in >>>> 3217+1 records out >>>> 3374010368 bytes (3.4 GB) copied, 63.5819 s, 53.1 MB/s >>>> >>>> I should add that even without your patches the progress doesn't quite >>>> correspond to the weight ratio: >>> >>> Forgot to fill in corresponding data for unpatched kernel here: >>> >>> 5962+2 records in >>> 5962+2 records out >>> 6252281856 bytes (6.3 GB) copied, 64.1719 s, 97.4 MB/s >>> 1502+0 records in >>> 1502+0 records out >>> 1574961152 bytes (1.6 GB) copied, 64.207 s, 24.5 MB/s >> >> Thanks for testing this, I'll see what we can do about that. It stands >> to reason that we'll throttle a heavier writer more, statistically. But >> I'm assuming this above test was run basically with just the writes >> going, so no real competition? And hence we end up throttling them >> equally much, destroying the weighting in the process. But for both >> cases, we basically don't pay any attention to cgroup weights. >> >>>> but still there is noticeable difference to cgroups with different >>>> weights. >>>> >>>> OTOH blk-throttle combines well with your patches: Limiting one >>>> cgroup to >>>> 5 M/s results in numbers like: >>>> >>>> 3883+2 records in >>>> 3883+2 records out >>>> 4072091648 bytes (4.1 GB) copied, 36.6713 s, 111 MB/s >>>> 413+0 records in >>>> 413+0 records out >>>> 433061888 bytes (433 MB) copied, 36.8939 s, 11.7 MB/s >>>> >>>> which is fine and comparable with unpatched kernel. Higher throughput >>>> number is because we do buffered writes and dd reports what it wrote >>>> into >>>> page cache. And there is no wonder blk-throttle combines fine - it >>>> throttles bios which happens before we reach writeback throttling >>>> mechanism. >> >> OK, that's good, at least that part works fine. And yes, the throttle >> path is hit before we end up in the make_request_fn, which is where wbt >> drops in. >> >>>> So I belive this demonstrates that your writeback throttling just >>>> doesn't >>>> work well with selective scheduling policy that happens below it >>>> because it >>>> can essentially lead to IO priority inversion issues... >> >> It this testing still done on the QD=1 ATA disk? Not too surprising that >> this falls apart, since we have very little room to maneuver. I wonder >> if a normal SATA with NCQ would behave better in this regard. I'll have >> to test a bit and think about how we can best handle this case. > > I think what we'll do for now is just disable wbt IFF we have a non-root > cgroup attached to CFQ. Done here: > > http://git.kernel.dk/cgit/linux-block/commit/?h=wb-buf-throttle&id=7315756efe76bbdf83076fc9dbc569bbb4da5d32 That was a bit too untested.. This should be better, it taps into where cfq normally notices a difference in blkcg: http://git.kernel.dk/cgit/linux-block/commit/?h=wb-buf-throttle&id=9b89e1bb666bd036a4cb1313479435087fb86ba0 -- Jens Axboe