Re: [PATCH] cgroup/dmem: Use size_t for try_charge() and uncharge() size

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: hongfu.li@linux.dev
Cc: hannes@cmpxchg.org, mkoutny@suse.com,
	dri-devel@lists.freedesktop.org, tj@kernel.org
Subject: Re: [PATCH] cgroup/dmem: Use size_t for try_charge() and uncharge() size
Date: Fri, 26 Jun 2026 08:27:19 +0000	[thread overview]
Message-ID: <20260626082719.BF3061F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260626081231.47464-1-hongfu.li@linux.dev>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

Pre-existing issues:
- [High] dmem cgroup controller incorrectly tracks bytes using page_counter, causing 2GB limit overflow on 32-bit architectures
- [High] Concurrent limit updates can corrupt page counter limits due to lack of serialization
- [High] Missing release barrier when publishing pool initialization allows bypassing hierarchical limits
--

commit 265d8a6cb2da9910a15e7066ffcacbb4c27c4ff0
Author: Hongfu Li <hongfu.li@linux.dev>
Subject: cgroup/dmem: Use size_t for try_charge() and uncharge() size

This commit changes the charge and uncharge APIs to accept a size_t value
for the allocation size rather than a u64. This prevents potential truncation
on 32-bit architectures when the size is passed to the internal page_counter
functions.

> diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
> index 4753a67d0f0f2..38108536d2d60 100644
> --- a/kernel/cgroup/dmem.c
> +++ b/kernel/cgroup/dmem.c
> @@ -649,7 +649,7 @@ EXPORT_SYMBOL_GPL(dmem_cgroup_uncharge);
>   *
>   * Return: 0 on success, -EAGAIN on hitting a limit, or a negative errno on failure.
>   */
> -int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size,
> +int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, size_t size,
>  			  struct dmem_cgroup_pool_state **ret_pool,
>  			  struct dmem_cgroup_pool_state **ret_limit_pool)
>  {

[ ... ]

>  	if (!page_counter_try_charge(&pool->cnt, size, &fail)) {

[Severity: High]
This is a pre-existing issue, but does using page_counter to track device
memory in bytes cause cumulative overflow on 32-bit architectures?

The page_counter API stores counts using unsigned long and internally
accumulates using atomic_long_add_return(). On 32-bit architectures, this
limits the tracking capability to LONG_MAX (2GB).

When cumulative device memory exceeds 2GB, the usage value wraps to negative,
and the signed bounds check in page_counter_try_charge() consistently fails,
causing valid allocations to be permanently rejected.

Similarly, in set_resource_max() and reset_all_resource_limits(), passing a
64-bit byte limit directly to page_counter_set_max() silently truncates
values above 4GB, and setting the default limit to PAGE_COUNTER_MAX restricts
usage to 2GB on 32-bit systems.

Could we track memory in pages instead of bytes to avoid this overflow?

[Severity: High]
This is a pre-existing issue, but can concurrent limit updates corrupt the
page counter limits?

page_counter_set_max() explicitly requires callers to serialize invocations.
Without serialization, concurrent writes can cause the optimistic
compare-and-swap retry loop to interleave, silently corrupting the maximum
limits and overwriting new limits with stale values.

This happens in dmemcg_limit_write() when applying limits:

kernel/cgroup/dmem.c:dmemcg_limit_write() {
    ...
		/* And commit */
		apply(pool, new_limit);
    ...
}

It also happens during cgroup offline when resetting limits under just an
RCU read lock:

kernel/cgroup/dmem.c:dmemcs_offline() {
    ...
	list_for_each_entry_rcu(pool, &dmemcs->pools, css_node)
		reset_all_resource_limits(pool);
    ...
}

Does this code need a lock to serialize these limit updates?

[Severity: High]
This is a pre-existing issue, but can the lack of memory barriers during pool
initialization allow hierarchical limits to be bypassed?

In alloc_pool_single(), a new pool is linked into the RCU list before it is
fully initialized. Later in get_cg_pool_locked(), pool->inited is set without
a release barrier:

kernel/cgroup/dmem.c:get_cg_pool_locked() {
    ...
		pool->cnt.parent = &ppool->cnt;
		if (ppool && !pool->parent) {
			pool->parent = ppool;
			dmemcg_pool_get(ppool);
		}
		pool->inited = true;
    ...
}

A concurrent reader in get_cg_pool_unlocked() uses READ_ONCE(pool->inited)
without an acquire barrier:

kernel/cgroup/dmem.c:get_cg_pool_unlocked() {
    ...
	pool = find_cg_pool_locked(cg, region);
	if (pool && !READ_ONCE(pool->inited))
		pool = NULL;
    ...
}

Due to memory reordering, the reader can observe pool->inited == true but
still see a NULL parent pointer. Would adding smp_store_release() and
smp_load_acquire() prevent this race?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260626081231.47464-1-hongfu.li@linux.dev?part=1

     prev parent reply	other threads:[~2026-06-26  8:27 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-26  8:12 [PATCH] cgroup/dmem: Use size_t for try_charge() and uncharge() size hongfu.li
2026-06-26  8:27 ` sashiko-bot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260626082719.BF3061F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=hongfu.li@linux.dev \
    --cc=mkoutny@suse.com \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.