From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964872AbcDYUaj (ORCPT ); Mon, 25 Apr 2016 16:30:39 -0400 Received: from mail-wm0-f52.google.com ([74.125.82.52]:38276 "EHLO mail-wm0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964777AbcDYUah (ORCPT ); Mon, 25 Apr 2016 16:30:37 -0400 Subject: Re: [PATCH RFC 10/22] block, bfq: add full hierarchical scheduling and cgroups support To: Tejun Heo References: <1454364778-25179-1-git-send-email-paolo.valente@linaro.org> <1454364778-25179-11-git-send-email-paolo.valente@linaro.org> <20160211222824.GD3741@mtj.duckdns.org> <57174CA7.5000706@linaro.org> <20160422181321.GV7822@mtj.duckdns.org> <20160422184110.GX7822@mtj.duckdns.org> <20160422193221.GY7822@mtj.duckdns.org> <57F65679-CD8E-43BA-8C46-C165B4C20677@linaro.org> <20160425192436.GE7822@mtj.duckdns.org> Cc: Jens Axboe , Fabio Checconi , Arianna Avanzini , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, ulf.hansson@linaro.org, linus.walleij@linaro.org, broonie@kernel.org From: Paolo Message-ID: <571E7E63.6010103@linaro.org> Date: Mon, 25 Apr 2016 22:30:27 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160425192436.GE7822@mtj.duckdns.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Il 25/04/2016 21:24, Tejun Heo ha scritto: > Hello, Paolo. > Hi > On Sat, Apr 23, 2016 at 09:07:47AM +0200, Paolo Valente wrote: >> There is certainly something I don’t know here, because I don’t >> understand why there is also a workqueue containing root-group I/O >> all the time, if the only process doing I/O belongs to a different >> (sub)group. > > Hmmm... maybe metadata updates? > That's what I thought in the first place. But one half or one third of the IOs sounded too much for metadata (the percentage varies over time during the test). And root-group IOs are apparently large. Here is an excerpt from the output of grep -B 1 insert_request trace kworker/u8:4-116 [002] d... 124.349971: 8,0 I W 3903488 + 1024 [kworker/u8:4] kworker/u8:4-116 [002] d... 124.349978: 8,0 m N cfq409A / insert_request -- kworker/u8:4-116 [002] d... 124.350770: 8,0 I W 3904512 + 1200 [kworker/u8:4] kworker/u8:4-116 [002] d... 124.350780: 8,0 m N cfq96A /seq_write insert_request -- kworker/u8:4-116 [002] d... 124.363911: 8,0 I W 3905712 + 1888 [kworker/u8:4] kworker/u8:4-116 [002] d... 124.363916: 8,0 m N cfq409A / insert_request -- kworker/u8:4-116 [002] d... 124.364467: 8,0 I W 3907600 + 352 [kworker/u8:4] kworker/u8:4-116 [002] d... 124.364474: 8,0 m N cfq96A /seq_write insert_request -- kworker/u8:4-116 [002] d... 124.369435: 8,0 I W 3907952 + 1680 [kworker/u8:4] kworker/u8:4-116 [002] d... 124.369439: 8,0 m N cfq96A /seq_write insert_request -- kworker/u8:4-116 [002] d... 124.369441: 8,0 I W 3909632 + 560 [kworker/u8:4] kworker/u8:4-116 [002] d... 124.369442: 8,0 m N cfq96A /seq_write insert_request -- kworker/u8:4-116 [002] d... 124.373299: 8,0 I W 3910192 + 1760 [kworker/u8:4] kworker/u8:4-116 [002] d... 124.373301: 8,0 m N cfq409A / insert_request -- kworker/u8:4-116 [002] d... 124.373519: 8,0 I W 3911952 + 480 [kworker/u8:4] kworker/u8:4-116 [002] d... 124.373522: 8,0 m N cfq96A /seq_write insert_request -- kworker/u8:4-116 [002] d... 124.381936: 8,0 I W 3912432 + 1728 [kworker/u8:4] kworker/u8:4-116 [002] d... 124.381937: 8,0 m N cfq409A / insert_request >> Anyway, if this is expected, then there is no reason to bother you >> further on it. In contrast, the actual problem I see is the >> following. If one third or half of the bios belong to a different >> group than the writer that one wants to isolate, then, whatever >> weight is assigned to the writer group, we will never be able to let >> the writer get the desired share of the time (or of the bandwidth >> with bfq and all quasi-sequential workloads). For instance, in the >> scenario that you told me to try, the writer will never get 50% of >> the time, with any scheduler. Am I missing something also on this? > > While a worker may jump across different cgroups, the IOs are still > coming from somewhere and if the only IO generator on the machine is > the test dd, the bios from that cgroup should dominate the IOs. I > think it'd be helpful to investigate who's issuing the root cgroup > IOs. > Ok (if there is some quick way to get this information without instrumenting the code, then any suggestion or pointer is welcome). Thanks, Paolo > Thanks. >