linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Miao Xie <miaoxie@huawei.com>
To: Tejun Heo <tj@kernel.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [BUG]Writeback Cgroup/Dirty Throttle: very small buffered write thoughput caused by writeback cgroup and dirty thottle
Date: Fri, 13 May 2016 14:11:53 +0800	[thread overview]
Message-ID: <57357029.5030002@huawei.com> (raw)
In-Reply-To: <20160512153234.GS4775@htj.duckdns.org>

on 2016/5/12 at 23:32, Tejun Heo wrote:
> On Thu, May 12, 2016 at 09:11:33AM +0800, Miao Xie wrote:
>>> My box has 48 cores and 188GB memory, but I set
>>> vm.dirty_background_bytes = 268435456
>>> vm.dirty_bytes = 536870912
>>>
>>> if I set vm.dirty_background_bytes and vm.dirty_bytes to be a large number(vm.dirty_background_bytes = 3GB,
>>> vm.dirty_bytes = 4GB), then fio thoughput would be more than 1500MB/s. and then if I reset them to the original
>>> value(the above ones), the thoughout would be down to 500MB/s.
>>>
>>> And according my debug, I found fio sleeped for 1ms every time we dirty a page(balance dirty pages) when
>>> the thoughput was down to 4MB/s, it might be a bug of dirty throttle when we open write back cgroup, I think.
>
> Heh, so, for cgroups, the absolute byte limits can't applied directly
> and converted to percentage value before being applied.  You're
> specifying 0.27% for threshold.  Unfortunately, the ratio is
> translated into a percentage number and 0.27% becomes 0, so your
> cgroups are always over limit and being throttled.
>
> Can you please see whether the following patch fixes the issue?

Better than the kernel without patch. Now the benchmark could reach the device bandwidth after 5-8 seconds.
But at the beginning, it was still very slow, and its thoughput was only 4MB/s for ~4 seconds, then it
could go up in 1~3 seconds.

Thanks
Miao

> Thanks.
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 999792d..a455a21 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -369,8 +369,9 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
>   	struct dirty_throttle_control *gdtc = mdtc_gdtc(dtc);
>   	unsigned long bytes = vm_dirty_bytes;
>   	unsigned long bg_bytes = dirty_background_bytes;
> -	unsigned long ratio = vm_dirty_ratio;
> -	unsigned long bg_ratio = dirty_background_ratio;
> +	/* convert ratios to per-PAGE_SIZE for higher precision */
> +	unsigned long ratio = (vm_dirty_ratio * PAGE_SIZE) / 100;
> +	unsigned long bg_ratio = (dirty_background_ratio * PAGE_SIZE) / 100;
>   	unsigned long thresh;
>   	unsigned long bg_thresh;
>   	struct task_struct *tsk;
> @@ -382,26 +383,28 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
>   		/*
>   		 * The byte settings can't be applied directly to memcg
>   		 * domains.  Convert them to ratios by scaling against
> -		 * globally available memory.
> +		 * globally available memory.  As the ratios are in
> +		 * per-PAGE_SIZE, they can be obtained by dividing bytes by
> +		 * pages.
>   		 */
>   		if (bytes)
> -			ratio = min(DIV_ROUND_UP(bytes, PAGE_SIZE) * 100 /
> -				    global_avail, 100UL);
> +			ratio = min(DIV_ROUND_UP(bytes, global_avail),
> +				    PAGE_SIZE);
>   		if (bg_bytes)
> -			bg_ratio = min(DIV_ROUND_UP(bg_bytes, PAGE_SIZE) * 100 /
> -				       global_avail, 100UL);
> +			bg_ratio = min(DIV_ROUND_UP(bg_bytes, global_avail),
> +				       PAGE_SIZE);
>   		bytes = bg_bytes = 0;
>   	}
>
>   	if (bytes)
>   		thresh = DIV_ROUND_UP(bytes, PAGE_SIZE);
>   	else
> -		thresh = (ratio * available_memory) / 100;
> +		thresh = (ratio * available_memory) / PAGE_SIZE;
>
>   	if (bg_bytes)
>   		bg_thresh = DIV_ROUND_UP(bg_bytes, PAGE_SIZE);
>   	else
> -		bg_thresh = (bg_ratio * available_memory) / 100;
> +		bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE;
>
>   	if (bg_thresh >= thresh)
>   		bg_thresh = thresh / 2;
>
> .
>

  reply	other threads:[~2016-05-13  6:14 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <57333E75.3080309@huawei.com>
2016-05-12  1:11 ` [BUG]Writeback Cgroup/Dirty Throttle: very small buffered write thoughput caused by writeback cgroup and dirty thottle Miao Xie
2016-05-12 15:32   ` Tejun Heo
2016-05-13  6:11     ` Miao Xie [this message]
2016-05-27 18:24       ` Tejun Heo
2016-05-27 18:34     ` [PATCH block/for-4.7-fixes] writeback: use higher precision calculation in domain_dirty_limits() Tejun Heo
2016-05-30  8:05       ` Jan Kara
2016-05-30 14:55       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57357029.5030002@huawei.com \
    --to=miaoxie@huawei.com \
    --cc=fengguang.wu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).