Re: cgroup v1 and balance_dirty_pages

public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed

From: Aneesh Kumar K V <aneesh.kumar-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
To: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Zefan Li <lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: cgroup v1 and balance_dirty_pages
Date: Thu, 17 Nov 2022 21:21:10 +0530	[thread overview]
Message-ID: <9fb5941b-2c74-87af-a476-ce94b43bb542@linux.ibm.com> (raw)
In-Reply-To: <697e50fd-1954-4642-9f61-1afad0ebf8c6-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>

On 11/17/22 9:12 PM, Aneesh Kumar K V wrote:
> On 11/17/22 8:42 PM, Johannes Weiner wrote:
>> Hi Aneesh,
>>
>> On Thu, Nov 17, 2022 at 12:24:13PM +0530, Aneesh Kumar K.V wrote:
>>> Currently, we don't pause in balance_dirty_pages with cgroup v1 when we
>>> have task dirtying too many pages w.r.t to memory limit in the memcg.
>>> This is because with cgroup v1 all the limits are checked against global
>>> available resources. So on a system with a large amount of memory, a
>>> cgroup with a smaller limit can easily hit OOM if the task within the
>>> cgroup continuously dirty pages.
>>
>> Page reclaim has special writeback throttling for cgroup1, see the
>> folio_wait_writeback() in shrink_folio_list(). It's not as smooth as
>> proper dirty throttling, but it should prevent OOMs.
>>
>> Is this not working anymore?
> 
> The test is a simple dd test on on a 256GB system.
> 
> root@lp2:/sys/fs/cgroup/memory# mkdir test
> root@lp2:/sys/fs/cgroup/memory# cd test/
> root@lp2:/sys/fs/cgroup/memory/test# echo 120M > memory.limit_in_bytes 
> root@lp2:/sys/fs/cgroup/memory/test# echo $$ > tasks 
> root@lp2:/sys/fs/cgroup/memory/test# dd if=/dev/zero of=/home/kvaneesh/test bs=1M 
> Killed
> 
> 
> Will it hit the folio_wait_writeback, because it is sequential i/o and none of the folio
> we are writing will be in writeback?

Other way to look at this is, if the writeback is never started via balance_dirty_pages,
will we be finding folios in shrink_folio_list that is in writeback? 

> 
>>
>>> Shouldn't we throttle the task based on the memcg limits in this case?
>>> commit 9badce000e2c ("cgroup, writeback: don't enable cgroup writeback
>>> on traditional hierarchies") indicates we run into issues with enabling
>>> cgroup writeback with v1. But we still can keep the global writeback
>>> domain, but check the throtling needs against memcg limits in
>>> balance_dirty_pages()?
>>
>> Deciding when to throttle is only one side of the coin, though.
>>
>> The other side is selective flushing in the IO context of whoever
>> generated the dirty data, and matching the rate of dirtying to the
>> rate of writeback. This isn't really possible in cgroup1, as the
>> domains for memory and IO control could be disjunct.
>>
>> For example, if a fast-IO cgroup shares memory with a slow-IO cgroup,
>> what's the IO context for flushing the shared dirty data? What's the
>> throttling rate you apply to dirtiers?
> 
> I am not using I/O controller at all. Only cpu and memory controllers are
> used and what I am observing is depending on the system memory size, the container
> with same memory limits will hit OOM on some machine and not on others.
> 
> One of the challenge with the above test is, we are not able to reclaim via
> shrink_folio_list() because these are dirty file lru pages and we take the
> below code path
> 
> 	if (folio_is_file_lru(folio) &&
> 			    (!current_is_kswapd() ||
> 			     !folio_test_reclaim(folio) ||
> 			     !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
> 	......
> 				goto activate_locked;
> 	}
> 
>  
> 
> -aneesh

next prev parent reply	other threads:[~2022-11-17 15:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-17  6:54 cgroup v1 and balance_dirty_pages Aneesh Kumar K.V
     [not found] ` <87wn7uf4ve.fsf-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
2022-11-17 15:12   ` Johannes Weiner
     [not found]     ` <Y3ZPZyaX1WN3tad4-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-11-17 15:42       ` Aneesh Kumar K V
     [not found]         ` <697e50fd-1954-4642-9f61-1afad0ebf8c6-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
2022-11-17 15:51           ` Aneesh Kumar K V [this message]
     [not found]             ` <9fb5941b-2c74-87af-a476-ce94b43bb542-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
2022-11-17 16:31               ` Johannes Weiner
     [not found]                 ` <Y3ZhyfROmGKn/jfr-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-11-17 17:16                   ` Aneesh Kumar K V
     [not found]                     ` <db372090-cd6d-32e9-2ed1-0d5f9dc9c1df-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
2022-11-17 17:50                       ` Johannes Weiner
     [not found]                         ` <Y3Z0ZIroRFd1B6ad-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-11-18  3:56                           ` Aneesh Kumar K V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9fb5941b-2c74-87af-a476-ce94b43bb542@linux.ibm.com \
    --to=aneesh.kumar-texmvtczx7aybs5ee8rs3a@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox