From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: righi.andrea@gmail.com, Michael Rubin <mrubin@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
menage@google.com, dave@linux.vnet.ibm.com, chlunde@ping.uio.no,
dpshah@google.com, eric.rannaud@gmail.com,
fernando@oss.ntt.co.jp, agk@sourceware.org,
m.innocenti@cineca.it, s-uchida@ap.jp.nec.com,
ryov@valinux.co.jp, matt@bluehost.com, dradford@bluehost.com,
containers@lists.linux-foundation.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC] [PATCH -mm 0/2] memcg: per cgroup dirty_ratio
Date: Wed, 08 Oct 2008 18:43:57 +0530 [thread overview]
Message-ID: <48ECB215.4040409@linux.vnet.ibm.com> (raw)
In-Reply-To: <20081008101642.fcfb9186.kamezawa.hiroyu@jp.fujitsu.com>
KAMEZAWA Hiroyuki wrote:
> On Tue, 07 Oct 2008 17:49:49 +0200
> Andrea Righi <righi.andrea@gmail.com> wrote:
>
>> Balbir Singh wrote:
>>> Michael Rubin wrote:
>>>> On Fri, Sep 12, 2008 at 1:18 PM, Andrew Morton
>>>> <akpm@linux-foundation.org> wrote:
>>>>> One thing to think about please: Michael Rubin is hitting problems with
>>>>> the existing /proc/sys/vm/dirty-ratio. Its present granularity of 1%
>>>>> is just too coarse for really large machines, and as
>>>>> memory-size/disk-speed ratios continue to increase, this will just get
>>>>> worse.
>>>> Re-sending since I top-posted before. Never again. Also adding more
>>>> thoughts on a byte based interface.
>>>>
>>>> Currently the problem we are hitting is that we cannot specify pdflush
>>>> to have background limits less than 1% of memory. I am currently
>>>> finishing up a patch right now that adds a dirty_ratio_millis
>>>> interface. I hope to submit the patch to LKML by the end of the week.
>>>>
>>>> The idea is that we don't want to break backwards compatibility and we
>>>> also don't want to have two conflicting knobs in the sysctl or
>>>> /proc/sys/vm/ space. I thought adding a new knob for those who want to
>>>> specify finer grained functionality was a compromise. So the patch has
>>>> a vm_dirty_ratio and a vm_dirty_ratio_millis interface. The first to
>>>> specify 0-100% and the second to specify .0 to .999%.
>>>>
>>>> So to represent 0.125% of RAM we set
>>>> vm_dirty_ratio = 0
>>>> vm_dirty_ratio_millis = 125
>>>>
>>>> The same for the background_ratio.
>>>>
>>>> I would also prefer using a bytes interface but I am not sure how to
>>>> offer that without either removing the legacy interface of the ratios
>>>> or by offering a concurrent interface that might be confusing such as
>>>> when users are looking at the old one and not aware of a new one.
>>>>
>>> Just provide a vm_dirty_ration_in_bytes interface and keep it in sync with
>>> vm_dirty_ratio (they are just two representations of the same internal value)
>>> and for higher resolution propose that users use the bytes interface.
>> Hi Balbir,
>>
>> now that I read carefully the documentation, the description in
>> Documentation/filesystems/proc.txt seems to be a bit misleading. In
>> proc.txt we say that dirty_ratio and dirty_background_ratio are "a
>> percentage of total system memory", but in mm/page-writeback.c we apply
>> the percentages to the dirtyable memory: free pages + reclaimable pages.
>> So, first of all I think we should clarify this in the documentation...
>>
>> Saying that, keeping in sync the vm_dirty_amount_in_bytes according to
>> dirty_ratio_in_percentage is not a trivial task. One is a static value,
>> the other depends on the dirtyable memory in the system. If we want to
>> preserve the same behaviour we should do the following:
>>
>> dirty_ratio = x => dirty_amount_in_bytes = x * dirtyable_memory / 100
>>
>> dirty_amount_in_bytes = y => dirty_ratio = y / dirtyable_memory * 100
>>
>> But anytime the dirtyable memory (or the total memory in the system)
>> changes we should update both values accordingly to preserve the
>> coherency between them (ouch!).
>>
I see what you mean.
>> Possible solutions:
>>
>> 1) introduce fine-grained dirty_ratio handling decimals by an opportune
>> parser (disadvantage: this would break the compatibility with all the
>> userspace apps that expect to read an int from vm_dirty_ratio)
>>
>> 2) introduce dirty_ratio + dirty_ratio_millis (disadvantage: can
>> generate unexpected behaviours when something is written to
>> dirty_ratio ignoring the existence of dirty_ratio_millis)
>>
>> 3) introduce dirty_ratio + dirty_amount_in_bytes mutually exclusive,
>> writing to one automatically "disable" the other (disadvantage:
>> writing to dirty_ratio ignoring dirty_amount_in_bytes can cause
>> unexpected behaviours)
>>
>> 4) introduce dirty_ratio + dirty_amount_in_bytes and change the
>> old behaviour: when something is written to dirty_ratio,
>> dirty_amount_in_bytes is evaluated in function of totalram_pages (or
>> the memcg limit) and then we always use this static value, instead of
>> something that depends on the dirtyable memory - we can easily update
>> dirty_amount_in_bytes also when totalram_pages or the memcg limit
>> changes (disadvantage: change an old - working - behaviour).
>>
>> 5) handle fine-grained dirty_ratio decimals by an opportune parser when
>> writing something to dirty_ratio; export the percentage units via
>> dirty_ratio, and the decimals via dirty_ratio_decimals; writing to
>> dirty_ratio_decimals is not allowed.
>>
>> I tend to choose 5. The same for dirty_background_ratio.
>>
>
> Hmm... I agree to "5"... like this ?
> ==
> prvoides
> - vm.dirty_ratio (1/100)
> - vm.dirty_ratio_percentmille(1/100,000, pcm)
>
> and allow
> #echo 0.05 > vm/dirty_ratio
> #cat vm/dirty_ratio
> 0
> #cat vm/dirty_ratio_percentmille
> 500
> ==
I guess this would be the easiest way forward, I'll let you select the
granularity of the interface and its meaning.
--
Balbir
next prev parent reply other threads:[~2008-10-08 13:14 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-12 15:09 [RFC] [PATCH -mm 0/2] memcg: per cgroup dirty_ratio Andrea Righi
[not found] ` <1221232192-13553-1-git-send-email-righi.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-09-12 20:18 ` Andrew Morton
2008-09-12 20:18 ` Andrew Morton
2008-09-12 23:04 ` Andrea Righi
[not found] ` <48CAF583.8060406-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-09-12 23:10 ` Andrew Morton
2008-09-12 23:10 ` Andrew Morton
[not found] ` <20080912161050.5b6b4065.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2008-09-22 22:26 ` Michael Rubin
2008-09-22 22:26 ` Michael Rubin
2008-09-22 23:41 ` Michael Rubin
[not found] ` <532480950809221641y3471267esff82a14be8056586-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-09-23 12:50 ` Andrea Righi
2008-09-23 17:48 ` KOSAKI Motohiro
2008-10-07 11:04 ` Balbir Singh
2008-09-23 12:50 ` Andrea Righi
2008-09-23 17:48 ` KOSAKI Motohiro
2008-09-23 20:21 ` Michael Rubin
[not found] ` <532480950809231321g7be0dd09pe6a32426b361e676-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-09-24 6:59 ` KOSAKI Motohiro
2008-09-24 6:59 ` KOSAKI Motohiro
[not found] ` <20080924024437.DC21.KOSAKI.MOTOHIRO-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2008-09-23 20:21 ` Michael Rubin
2008-10-07 10:35 ` Andrea Righi
2008-10-07 10:35 ` Andrea Righi
2008-10-07 11:04 ` Balbir Singh
[not found] ` <48EB4236.1060100-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2008-10-07 15:49 ` Andrea Righi
2008-10-07 15:49 ` Andrea Righi
[not found] ` <48EB851D.2030300-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-10-08 1:16 ` KAMEZAWA Hiroyuki
2008-10-08 1:16 ` KAMEZAWA Hiroyuki
2008-10-08 13:13 ` Balbir Singh [this message]
2008-10-09 15:29 ` [PATCH -mm] page-writeback: fine-grained dirty_ratio and dirty_background_ratio Andrea Righi
2008-10-09 15:29 ` Andrea Righi
2008-10-10 0:41 ` KAMEZAWA Hiroyuki
2008-10-10 0:41 ` KAMEZAWA Hiroyuki
2008-10-10 9:32 ` Andrea Righi
2008-10-10 9:32 ` Andrea Righi
2008-10-10 13:13 ` Andrea Righi
2008-10-10 13:13 ` Andrea Righi
2008-11-10 20:58 ` [PATCH -mm] mm: fine-grained dirty_ratio_pcm and dirty_background_ratio_pcm (v2) Andrea Righi
2008-11-10 20:58 ` Andrea Righi
[not found] ` <4918A074.1050003-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-11-10 21:12 ` Andrew Morton
2008-11-10 21:12 ` Andrew Morton
2008-11-10 21:12 ` Andrew Morton
2008-11-10 22:03 ` Andrea Righi
2008-11-10 22:03 ` Andrea Righi
[not found] ` <4918AFA1.4000102-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-11-10 22:12 ` Andrew Morton
2008-11-10 22:12 ` Andrew Morton
2008-11-10 22:12 ` Andrew Morton
2008-11-10 22:15 ` David Rientjes
2008-11-10 22:15 ` David Rientjes
2008-11-10 22:15 ` David Rientjes
[not found] ` <20081008101642.fcfb9186.kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2008-10-08 13:13 ` [RFC] [PATCH -mm 0/2] memcg: per cgroup dirty_ratio Balbir Singh
[not found] ` <20080912131816.e0cfac7a.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2008-09-12 23:04 ` Andrea Righi
2008-09-22 23:41 ` Michael Rubin
-- strict thread matches above, loose matches on Subject: below --
2008-09-12 15:09 Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48ECB215.4040409@linux.vnet.ibm.com \
--to=balbir@linux.vnet.ibm.com \
--cc=agk@sourceware.org \
--cc=akpm@linux-foundation.org \
--cc=chlunde@ping.uio.no \
--cc=containers@lists.linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=dpshah@google.com \
--cc=dradford@bluehost.com \
--cc=eric.rannaud@gmail.com \
--cc=fernando@oss.ntt.co.jp \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=m.innocenti@cineca.it \
--cc=matt@bluehost.com \
--cc=menage@google.com \
--cc=mrubin@google.com \
--cc=righi.andrea@gmail.com \
--cc=ryov@valinux.co.jp \
--cc=s-uchida@ap.jp.nec.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.