public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
To: Aneesh Kumar K V <aneesh.kumar-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Zefan Li <lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: cgroup v1 and balance_dirty_pages
Date: Thu, 17 Nov 2022 12:50:28 -0500	[thread overview]
Message-ID: <Y3Z0ZIroRFd1B6ad@cmpxchg.org> (raw)
In-Reply-To: <db372090-cd6d-32e9-2ed1-0d5f9dc9c1df-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>

On Thu, Nov 17, 2022 at 10:46:53PM +0530, Aneesh Kumar K V wrote:
> On 11/17/22 10:01 PM, Johannes Weiner wrote:
> > On Thu, Nov 17, 2022 at 09:21:10PM +0530, Aneesh Kumar K V wrote:
> >> On 11/17/22 9:12 PM, Aneesh Kumar K V wrote:
> >>> On 11/17/22 8:42 PM, Johannes Weiner wrote:
> >>>> Hi Aneesh,
> >>>>
> >>>> On Thu, Nov 17, 2022 at 12:24:13PM +0530, Aneesh Kumar K.V wrote:
> >>>>> Currently, we don't pause in balance_dirty_pages with cgroup v1 when we
> >>>>> have task dirtying too many pages w.r.t to memory limit in the memcg.
> >>>>> This is because with cgroup v1 all the limits are checked against global
> >>>>> available resources. So on a system with a large amount of memory, a
> >>>>> cgroup with a smaller limit can easily hit OOM if the task within the
> >>>>> cgroup continuously dirty pages.
> >>>>
> >>>> Page reclaim has special writeback throttling for cgroup1, see the
> >>>> folio_wait_writeback() in shrink_folio_list(). It's not as smooth as
> >>>> proper dirty throttling, but it should prevent OOMs.
> >>>>
> >>>> Is this not working anymore?
> >>>
> >>> The test is a simple dd test on on a 256GB system.
> >>>
> >>> root@lp2:/sys/fs/cgroup/memory# mkdir test
> >>> root@lp2:/sys/fs/cgroup/memory# cd test/
> >>> root@lp2:/sys/fs/cgroup/memory/test# echo 120M > memory.limit_in_bytes 
> >>> root@lp2:/sys/fs/cgroup/memory/test# echo $$ > tasks 
> >>> root@lp2:/sys/fs/cgroup/memory/test# dd if=/dev/zero of=/home/kvaneesh/test bs=1M 
> >>> Killed
> >>>
> >>>
> >>> Will it hit the folio_wait_writeback, because it is sequential i/o and none of the folio
> >>> we are writing will be in writeback?
> >>
> >> Other way to look at this is, if the writeback is never started via balance_dirty_pages,
> >> will we be finding folios in shrink_folio_list that is in writeback? 
> > 
> > The flushers are started from reclaim if necessary. See this code from
> > shrink_inactive_list():
> > 
> > 	/*
> > 	 * If dirty folios are scanned that are not queued for IO, it
> > 	 * implies that flushers are not doing their job. This can
> > 	 * happen when memory pressure pushes dirty folios to the end of
> > 	 * the LRU before the dirty limits are breached and the dirty
> > 	 * data has expired. It can also happen when the proportion of
> > 	 * dirty folios grows not through writes but through memory
> > 	 * pressure reclaiming all the clean cache. And in some cases,
> > 	 * the flushers simply cannot keep up with the allocation
> > 	 * rate. Nudge the flusher threads in case they are asleep.
> > 	 */
> > 	if (stat.nr_unqueued_dirty == nr_taken)
> > 		wakeup_flusher_threads(WB_REASON_VMSCAN);
> > 
> > It sounds like there isn't enough time for writeback to commence
> > before the memcg already declares OOM.
> > 
> > If you place a reclaim_throttle(VMSCAN_THROTTLE_WRITEBACK) after that
> > wakeup, does that fix the issue?
> 
> yes. That helped. One thing I noticed is with that reclaim_throttle, we
> don't end up calling folio_wait_writeback() at all. But still the
> dd was able to continue till the file system got full. 
> 
> Without that reclaim_throttle(), we do end up calling folio_wait_writeback()
> but at some point hit OOM 

Interesting. This is probably owed to the discrepancy between total
memory and the cgroup size. The flusher might put the occasional
cgroup page under writeback, but cgroup reclaim will still see mostly
dirty pages and not slow down enough.

Would you mind sending a patch for adding that reclaim_throttle()?
Gated on !writeback_throttling_sane(), with a short comment explaining
that the flushers may not issue writeback quickly enough for cgroup1
writeback throttling to work on larger systems with small cgroups.

  parent reply	other threads:[~2022-11-17 17:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-17  6:54 cgroup v1 and balance_dirty_pages Aneesh Kumar K.V
     [not found] ` <87wn7uf4ve.fsf-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
2022-11-17 15:12   ` Johannes Weiner
     [not found]     ` <Y3ZPZyaX1WN3tad4-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-11-17 15:42       ` Aneesh Kumar K V
     [not found]         ` <697e50fd-1954-4642-9f61-1afad0ebf8c6-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
2022-11-17 15:51           ` Aneesh Kumar K V
     [not found]             ` <9fb5941b-2c74-87af-a476-ce94b43bb542-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
2022-11-17 16:31               ` Johannes Weiner
     [not found]                 ` <Y3ZhyfROmGKn/jfr-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-11-17 17:16                   ` Aneesh Kumar K V
     [not found]                     ` <db372090-cd6d-32e9-2ed1-0d5f9dc9c1df-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org>
2022-11-17 17:50                       ` Johannes Weiner [this message]
     [not found]                         ` <Y3Z0ZIroRFd1B6ad-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2022-11-18  3:56                           ` Aneesh Kumar K V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y3Z0ZIroRFd1B6ad@cmpxchg.org \
    --to=hannes-druugvl0lcnafugrpc6u6w@public.gmane.org \
    --cc=aneesh.kumar-tEXmvtCZX7AybS5Ee8rs3A@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox