public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Chengming Zhou <chengming.zhou@linux.dev>
To: LKML <linux-kernel@vger.kernel.org>, linux-mm <linux-mm@kvack.org>
Cc: jack@suse.cz, Tejun Heo <tj@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Christoph Hellwig <hch@lst.de>,
	shr@devkernel.io, neilb@suse.de, Michal Hocko <mhocko@suse.com>
Subject: Question: memcg dirty throttle caused by low per-memcg dirty thresh
Date: Wed, 22 Nov 2023 17:38:25 +0800	[thread overview]
Message-ID: <109029e0-1772-4102-a2a8-ab9076462454@linux.dev> (raw)

Hello all,

Sorry to bother you, we encountered a problem related to the memcg dirty
throttle after migrating from cgroup v1 to v2, so want to ask for some
comments or suggestions.

1. Problem

We have the "containerd" service running under system.slice, with
its memory.max set to 5GB. It will be constantly throttled in the
balance_dirty_pages() since the memcg has dirty memory more than
the memcg dirty thresh.

We haven't this problem on cgroup v1, because cgroup v1 doesn't have
the per-memcg writeback and per-memcg dirty thresh. Only the global
dirty thresh will be checked in balance_dirty_pages().

2. Thinking

So we wonder if we can support the per-memcg dirty thresh interface?
Now the memcg dirty thresh is just calculated from memcg max * ratio,
which can be set from /proc/sys/vm/dirty_ratio.

We have to set it to 60 instead of the default 20 to workaround now,
but worry about the potential side effects.

If we can support the per-memcg dirty thresh interface, we can set
some containers to a much higher dirty_ratio, especially for hungry
dirtier workloads like "containerd".

3. Solution?

But we could't think of a good solution to support this. The current
memcg dirty thresh is calculated from a complex rule:

	memcg dirty thresh = memcg avail * dirty_ratio

memcg avail is from combination of: memcg max/high, memcg files
and capped by system-wide clean memory excluding the amount being used
in the memcg.

Although we may find a way to calculate the per-memcg dirty thresh,
we can't use it directly, since we still need to calculate/distribute
dirty thresh to the per-wb dirty thresh share.

R - A - B
    \-- C

For example, if we know the dirty thresh of A, but wb is in C, we
have no way to distribute the dirty thresh shares to the wb in C.

But we have to get the dirty thresh of the wb in C, since we need it
to control throttling process of the wb in balance_dirty_pages().

I may have missed something above, but the problem seems clear IMHO.
Looking forward to any comment or suggestion.

Thanks!

             reply	other threads:[~2023-11-22  9:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-22  9:38 Chengming Zhou [this message]
2023-11-22 10:02 ` Question: memcg dirty throttle caused by low per-memcg dirty thresh Michal Hocko
2023-11-22 14:59   ` Chengming Zhou
2023-11-22 14:49 ` Jan Kara
2023-11-22 15:32   ` Chengming Zhou
2023-11-23 16:20     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=109029e0-1772-4102-a2a8-ab9076462454@linux.dev \
    --to=chengming.zhou@linux.dev \
    --cc=hannes@cmpxchg.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=neilb@suse.de \
    --cc=shr@devkernel.io \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox