From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751382AbcEMGOD (ORCPT ); Fri, 13 May 2016 02:14:03 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:38201 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750750AbcEMGOB (ORCPT ); Fri, 13 May 2016 02:14:01 -0400 Subject: Re: [BUG]Writeback Cgroup/Dirty Throttle: very small buffered write thoughput caused by writeback cgroup and dirty thottle To: Tejun Heo References: <57333E75.3080309@huawei.com> <5733D845.2030709@huawei.com> <20160512153234.GS4775@htj.duckdns.org> CC: Fengguang Wu , From: Miao Xie Message-ID: <57357029.5030002@huawei.com> Date: Fri, 13 May 2016 14:11:53 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20160512153234.GS4775@htj.duckdns.org> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.234.67] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090203.5735709E.00FD,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 92955e6da1a5553ed5600e498944c684 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org on 2016/5/12 at 23:32, Tejun Heo wrote: > On Thu, May 12, 2016 at 09:11:33AM +0800, Miao Xie wrote: >>> My box has 48 cores and 188GB memory, but I set >>> vm.dirty_background_bytes = 268435456 >>> vm.dirty_bytes = 536870912 >>> >>> if I set vm.dirty_background_bytes and vm.dirty_bytes to be a large number(vm.dirty_background_bytes = 3GB, >>> vm.dirty_bytes = 4GB), then fio thoughput would be more than 1500MB/s. and then if I reset them to the original >>> value(the above ones), the thoughout would be down to 500MB/s. >>> >>> And according my debug, I found fio sleeped for 1ms every time we dirty a page(balance dirty pages) when >>> the thoughput was down to 4MB/s, it might be a bug of dirty throttle when we open write back cgroup, I think. > > Heh, so, for cgroups, the absolute byte limits can't applied directly > and converted to percentage value before being applied. You're > specifying 0.27% for threshold. Unfortunately, the ratio is > translated into a percentage number and 0.27% becomes 0, so your > cgroups are always over limit and being throttled. > > Can you please see whether the following patch fixes the issue? Better than the kernel without patch. Now the benchmark could reach the device bandwidth after 5-8 seconds. But at the beginning, it was still very slow, and its thoughput was only 4MB/s for ~4 seconds, then it could go up in 1~3 seconds. Thanks Miao > Thanks. > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 999792d..a455a21 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -369,8 +369,9 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc) > struct dirty_throttle_control *gdtc = mdtc_gdtc(dtc); > unsigned long bytes = vm_dirty_bytes; > unsigned long bg_bytes = dirty_background_bytes; > - unsigned long ratio = vm_dirty_ratio; > - unsigned long bg_ratio = dirty_background_ratio; > + /* convert ratios to per-PAGE_SIZE for higher precision */ > + unsigned long ratio = (vm_dirty_ratio * PAGE_SIZE) / 100; > + unsigned long bg_ratio = (dirty_background_ratio * PAGE_SIZE) / 100; > unsigned long thresh; > unsigned long bg_thresh; > struct task_struct *tsk; > @@ -382,26 +383,28 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc) > /* > * The byte settings can't be applied directly to memcg > * domains. Convert them to ratios by scaling against > - * globally available memory. > + * globally available memory. As the ratios are in > + * per-PAGE_SIZE, they can be obtained by dividing bytes by > + * pages. > */ > if (bytes) > - ratio = min(DIV_ROUND_UP(bytes, PAGE_SIZE) * 100 / > - global_avail, 100UL); > + ratio = min(DIV_ROUND_UP(bytes, global_avail), > + PAGE_SIZE); > if (bg_bytes) > - bg_ratio = min(DIV_ROUND_UP(bg_bytes, PAGE_SIZE) * 100 / > - global_avail, 100UL); > + bg_ratio = min(DIV_ROUND_UP(bg_bytes, global_avail), > + PAGE_SIZE); > bytes = bg_bytes = 0; > } > > if (bytes) > thresh = DIV_ROUND_UP(bytes, PAGE_SIZE); > else > - thresh = (ratio * available_memory) / 100; > + thresh = (ratio * available_memory) / PAGE_SIZE; > > if (bg_bytes) > bg_thresh = DIV_ROUND_UP(bg_bytes, PAGE_SIZE); > else > - bg_thresh = (bg_ratio * available_memory) / 100; > + bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE; > > if (bg_thresh >= thresh) > bg_thresh = thresh / 2; > > . >