All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <roman.gushchin@linux.dev>
To: David Finkel <davidf@vimeo.com>
Cc: "Muchun Song" <muchun.song@linux.dev>,
	"Tejun Heo" <tj@kernel.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	core-services@vimeo.com, "Jonathan Corbet" <corbet@lwn.net>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Shakeel Butt" <shakeel.butt@linux.dev>,
	"Shuah Khan" <shuah@kernel.org>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Zefan Li" <lizefan.x@bytedance.com>,
	cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
	"Michal Koutný" <mkoutny@suse.com>,
	"Waiman Long" <longman@redhat.com>
Subject: Re: [PATCH v7 1/2] mm, memcg: cg2 memory{.swap,}.peak write handlers
Date: Tue, 6 Aug 2024 23:12:53 +0000	[thread overview]
Message-ID: <ZrKt9Xd_hQ3kM8jY@google.com> (raw)
In-Reply-To: <20240730231304.761942-2-davidf@vimeo.com>

On Tue, Jul 30, 2024 at 07:13:03PM -0400, David Finkel wrote:
> Other mechanisms for querying the peak memory usage of either a process
> or v1 memory cgroup allow for resetting the high watermark. Restore
> parity with those mechanisms, but with a less racy API.
> 
> For example:
>  - Any write to memory.max_usage_in_bytes in a cgroup v1 mount resets
>    the high watermark.
>  - writing "5" to the clear_refs pseudo-file in a processes's proc
>    directory resets the peak RSS.
> 
> This change is an evolution of a previous patch, which mostly copied the
> cgroup v1 behavior, however, there were concerns about races/ownership
> issues with a global reset, so instead this change makes the reset
> filedescriptor-local.
> 
> Writing any non-empty string to the memory.peak and memory.swap.peak
> pseudo-files reset the high watermark to the current usage for
> subsequent reads through that same FD.
> 
> Notably, following Johannes's suggestion, this implementation moves the
> O(FDs that have written) behavior onto the FD write(2) path. Instead, on
> the page-allocation path, we simply add one additional watermark to
> conditionally bump per-hierarchy level in the page-counter.
> 
> Additionally, this takes Longman's suggestion of nesting the
> page-charging-path checks for the two watermarks to reduce the number of
> common-case comparisons.
> 
> This behavior is particularly useful for work scheduling systems that
> need to track memory usage of worker processes/cgroups per-work-item.
> Since memory can't be squeezed like CPU can (the OOM-killer has
> opinions), these systems need to track the peak memory usage to compute
> system/container fullness when binpacking workitems.
> 
> Most notably, Vimeo's use-case involves a system that's doing global
> binpacking across many Kubernetes pods/containers, and while we can use
> PSI for some local decisions about overload, we strive to avoid packing
> workloads too tightly in the first place. To facilitate this, we track
> the peak memory usage. However, since we run with long-lived workers (to
> amortize startup costs) we need a way to track the high watermark while
> a work-item is executing. Polling runs the risk of missing short spikes
> that last for timescales below the polling interval, and peak memory
> tracking at the cgroup level is otherwise perfect for this use-case.
> 
> As this data is used to ensure that binpacked work ends up with
> sufficient headroom, this use-case mostly avoids the inaccuracies
> surrounding reclaimable memory.
> 
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Suggested-by: Waiman Long <longman@redhat.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Reviewed-by: Michal Koutný <mkoutny@suse.com>
> Signed-off-by: David Finkel <davidf@vimeo.com>

Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>

Thanks!

  reply	other threads:[~2024-08-06 23:13 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-30 23:13 [PATCH v7] mm, memcg: cg2 memory{.swap,}.peak write handlers David Finkel
2024-07-30 23:13 ` [PATCH v7 1/2] " David Finkel
2024-08-06 23:12   ` Roman Gushchin [this message]
2024-07-30 23:13 ` [PATCH v7 2/2] mm, memcg: cg2 memory{.swap,}.peak write tests David Finkel
2024-08-06 23:13   ` Roman Gushchin
2024-08-06 20:16 ` [PATCH v7] mm, memcg: cg2 memory{.swap,}.peak write handlers David Finkel
2024-08-06 20:18   ` Tejun Heo
2024-08-06 20:21     ` David Finkel
2024-08-06 22:36     ` Andrew Morton
2024-08-06 23:00       ` David Finkel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZrKt9Xd_hQ3kM8jY@google.com \
    --to=roman.gushchin@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=core-services@vimeo.com \
    --cc=davidf@vimeo.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=longman@redhat.com \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=shuah@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.