From: Miao Xie <miaoxie@huawei.com>
To: Tejun Heo <tj@kernel.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>, <linux-kernel@vger.kernel.org>
Subject: Re: [BUG]Writeback Cgroup/Dirty Throttle: very small buffered write thoughput caused by writeback cgroup and dirty thottle
Date: Fri, 13 May 2016 14:11:53 +0800 [thread overview]
Message-ID: <57357029.5030002@huawei.com> (raw)
In-Reply-To: <20160512153234.GS4775@htj.duckdns.org>
on 2016/5/12 at 23:32, Tejun Heo wrote:
> On Thu, May 12, 2016 at 09:11:33AM +0800, Miao Xie wrote:
>>> My box has 48 cores and 188GB memory, but I set
>>> vm.dirty_background_bytes = 268435456
>>> vm.dirty_bytes = 536870912
>>>
>>> if I set vm.dirty_background_bytes and vm.dirty_bytes to be a large number(vm.dirty_background_bytes = 3GB,
>>> vm.dirty_bytes = 4GB), then fio thoughput would be more than 1500MB/s. and then if I reset them to the original
>>> value(the above ones), the thoughout would be down to 500MB/s.
>>>
>>> And according my debug, I found fio sleeped for 1ms every time we dirty a page(balance dirty pages) when
>>> the thoughput was down to 4MB/s, it might be a bug of dirty throttle when we open write back cgroup, I think.
>
> Heh, so, for cgroups, the absolute byte limits can't applied directly
> and converted to percentage value before being applied. You're
> specifying 0.27% for threshold. Unfortunately, the ratio is
> translated into a percentage number and 0.27% becomes 0, so your
> cgroups are always over limit and being throttled.
>
> Can you please see whether the following patch fixes the issue?
Better than the kernel without patch. Now the benchmark could reach the device bandwidth after 5-8 seconds.
But at the beginning, it was still very slow, and its thoughput was only 4MB/s for ~4 seconds, then it
could go up in 1~3 seconds.
Thanks
Miao
> Thanks.
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 999792d..a455a21 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -369,8 +369,9 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
> struct dirty_throttle_control *gdtc = mdtc_gdtc(dtc);
> unsigned long bytes = vm_dirty_bytes;
> unsigned long bg_bytes = dirty_background_bytes;
> - unsigned long ratio = vm_dirty_ratio;
> - unsigned long bg_ratio = dirty_background_ratio;
> + /* convert ratios to per-PAGE_SIZE for higher precision */
> + unsigned long ratio = (vm_dirty_ratio * PAGE_SIZE) / 100;
> + unsigned long bg_ratio = (dirty_background_ratio * PAGE_SIZE) / 100;
> unsigned long thresh;
> unsigned long bg_thresh;
> struct task_struct *tsk;
> @@ -382,26 +383,28 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc)
> /*
> * The byte settings can't be applied directly to memcg
> * domains. Convert them to ratios by scaling against
> - * globally available memory.
> + * globally available memory. As the ratios are in
> + * per-PAGE_SIZE, they can be obtained by dividing bytes by
> + * pages.
> */
> if (bytes)
> - ratio = min(DIV_ROUND_UP(bytes, PAGE_SIZE) * 100 /
> - global_avail, 100UL);
> + ratio = min(DIV_ROUND_UP(bytes, global_avail),
> + PAGE_SIZE);
> if (bg_bytes)
> - bg_ratio = min(DIV_ROUND_UP(bg_bytes, PAGE_SIZE) * 100 /
> - global_avail, 100UL);
> + bg_ratio = min(DIV_ROUND_UP(bg_bytes, global_avail),
> + PAGE_SIZE);
> bytes = bg_bytes = 0;
> }
>
> if (bytes)
> thresh = DIV_ROUND_UP(bytes, PAGE_SIZE);
> else
> - thresh = (ratio * available_memory) / 100;
> + thresh = (ratio * available_memory) / PAGE_SIZE;
>
> if (bg_bytes)
> bg_thresh = DIV_ROUND_UP(bg_bytes, PAGE_SIZE);
> else
> - bg_thresh = (bg_ratio * available_memory) / 100;
> + bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE;
>
> if (bg_thresh >= thresh)
> bg_thresh = thresh / 2;
>
> .
>
next prev parent reply other threads:[~2016-05-13 6:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <57333E75.3080309@huawei.com>
2016-05-12 1:11 ` [BUG]Writeback Cgroup/Dirty Throttle: very small buffered write thoughput caused by writeback cgroup and dirty thottle Miao Xie
2016-05-12 15:32 ` Tejun Heo
2016-05-13 6:11 ` Miao Xie [this message]
2016-05-27 18:24 ` Tejun Heo
2016-05-27 18:34 ` [PATCH block/for-4.7-fixes] writeback: use higher precision calculation in domain_dirty_limits() Tejun Heo
2016-05-30 8:05 ` Jan Kara
2016-05-30 14:55 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57357029.5030002@huawei.com \
--to=miaoxie@huawei.com \
--cc=fengguang.wu@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.