Re: [RFC PATCH v2 3/4] mm/damon: introduce DAMON_HUGEPAGE for hot region hugepage collapsing

DAMON development mailing list
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: gutierrez.asier@huawei-partners.com
Cc: damon@lists.linux.dev
Subject: Re: [RFC PATCH v2 3/4] mm/damon: introduce DAMON_HUGEPAGE for hot region hugepage collapsing
Date: Fri, 22 May 2026 17:12:10 +0000	[thread overview]
Message-ID: <20260522171210.900B11F00A3D@smtp.kernel.org> (raw)
In-Reply-To: <20260522145518.158910-4-gutierrez.asier@huawei-partners.com>

Thank you for your contribution! Sashiko AI review found 8 potential issue(s) to consider:
- [High] Writing to the `enabled` sysfs parameter after `damon_hugepage_init()` fails causes an unconditional NULL pointer dereference.
- [High] A concurrent sysfs write can set `aggr_interval` to 0 after it was checked, causing a divide-by-zero kernel panic.
- [High] The `enabled` sysfs parameter becomes permanently stale if the monitored process terminates, locking out users.
- [High] Missing `is_power_of_2()` check for `min_region_sz` causes silent daemon termination and sysfs lockout.
- [Medium] The `monitored_pid` parameter is interpreted in different PID namespaces depending on the triggering mechanism.
- [Low] The `monitored_pid` variable is defined as a `long` but exported as an `unsigned long` via `module_param`.
- [Low] The default value of `min_age` does not match the module's documentation comment.
- [Low] Memory allocated for the global DAMON context and target structures is leaked if module initialization fails.
--

commit 9a01dc3d39519d9465c7f43f62e4bda67ccf751a
Author: Asier Gutierrez <gutierrez.asier@huawei-partners.com>

mm/damon: introduce DAMON_HUGEPAGE for hot region hugepage collapsing

This patch introduces a new DAMON module (DAMON_HUGEPAGE) which
collapses hot regions into huge pages.

> diff --git a/mm/damon/hugepage.c b/mm/damon/hugepage.c
> new file mode 100644
> index 0000000000000..f7ca3a409f59c
> --- /dev/null
> +++ b/mm/damon/hugepage.c

[ ... ]

> +static long monitored_pid;
> +module_param(monitored_pid, ulong, 0600);

[Severity: Low]
Is this intentional to declare monitored_pid as a long but export it via
module_param as an unsigned long? Both the declaration and the sysfs
parameter type should be consistent.

> +
> +/*
> + * Time threshold for hot memory regions identification in microseconds.
> + *
> + * If a memory region has been accessed for this or longer time,
> + * DAMON_HUGEPAGE identifies the region as hot, and collapse it into a huge
> + * page.  100 microseconds by default.
> + */
> +static unsigned long min_age __read_mostly = 100000;

[Severity: Low]
Does the comment match the default value here? The comment explicitly
states 100 microseconds by default, but the variable is initialized to
100000, which translates to 100 milliseconds since DAMON time parameters
are handled in microseconds.

[ ... ]

> +static struct damos *damon_hugepage_new_scheme(void)
> +{
> +	struct damos_access_pattern pattern = {
> +		/* Find regions having PMD_SIZE or larger size */
> +		.min_sz_region = PMD_SIZE,
> +		.max_sz_region = ULONG_MAX,
> +		.min_nr_accesses = 0,
> +		.max_nr_accesses = UINT_MAX,
> +		.min_age_region = min_age /
> +			damon_hugepage_mon_attrs.aggr_interval,

[Severity: High]
Could a concurrent sysfs write set aggr_interval to 0 here?

While aggr_interval is checked for zero in damon_hugepage_apply_parameters(),
it is exposed via module_param_named and can be modified concurrently by root.

If a write occurs between the check and this division, could it result in
an unhandled division by zero kernel panic?

> +		.max_age_region = UINT_MAX,
> +	};

[ ... ]

> +static int damon_hugepage_apply_parameters(void)
> +{
> +	struct damon_ctx *param_ctx;
> +	struct damon_target *param_target;
> +	struct damos *scheme;
> +	struct damos_quota_goal *goal;
> +	struct pid *spid;
> +	int err;
> +
> +	err = damon_modules_new_vaddr_ctx_target(&param_ctx, &param_target);
> +	if (err)
> +		return err;
> +
> +	param_ctx->addr_unit = addr_unit;
> +	param_ctx->min_region_sz = max(DAMON_MIN_REGION_SZ / addr_unit, 1);

[Severity: High]
Should we check if min_region_sz is a power of two here?

If addr_unit is set via sysfs such that this calculation yields a
non-power-of-two value (e.g., 3), damon_commit_ctx() will return -EINVAL.

Before returning, damon_commit_ctx() sets dst->maybe_corrupted = true,
which flags the running ctx as corrupted and causes the kdamond thread
to silently terminate. This leads to the enabled sysfs parameter getting
stuck permanently.

> +
> +	spid = find_get_pid(monitored_pid);

[Severity: Medium]
Will this resolve the PID in different namespaces depending on how the
update is triggered?

When a user writes to the 'enabled' parameter, find_get_pid() executes in
the user's process context using the user's PID namespace.

When a user writes to 'commit_inputs', the application is handled
asynchronously by damon_hugepage_damon_call_fn() inside the kdamond kernel
thread, which operates in the root (init) PID namespace.

This means the same monitored_pid value could be interpreted as different
processes depending on which sysfs parameter triggered the update.

> +	if (!spid) {
> +		err = -EINVAL;
> +		goto out;
> +	}

[ ... ]

> +static int damon_hugepage_enabled_store(const char *val,
> +				const struct kernel_param *kp)
> +{
> +	bool is_enabled = enabled;
> +	bool enable;
> +	int err;
> +
> +	err = kstrtobool(val, &enable);
> +	if (err)
> +		return err;
> +
> +	if (is_enabled == enable)
> +		return 0;
> +
> +	/* Called before init function.  The function will handle this. */
> +	if (!damon_initialized())
> +		goto set_param_out;
> +
> +	err = damon_hugepage_turn(enable);

[Severity: High]
Can this unconditionally dereference a NULL pointer if module initialization
fails?

If damon_hugepage_init() fails (e.g., memory allocation fails in
damon_modules_new_vaddr_ctx_target()), the global ctx pointer remains NULL.
Because DAMON_HUGEPAGE is a built-in module, its sysfs parameters are
still accessible.

A subsequent write of 'Y' to 'enabled' invokes damon_hugepage_turn(true),
which calls damon_hugepage_apply_parameters(), leading to:

damon_commit_ctx(ctx, param_ctx)
    dst->maybe_corrupted = true;

Since dst is the global ctx pointer which is NULL, this will cause an
immediate kernel panic.

[Severity: High]
Could the 'enabled' sysfs parameter become permanently stale here?

If the monitored target process exits, the kdamond thread detects the invalid
target, stops, and clears ctx->kdamond, but the global 'enabled' parameter
remains true.

If a user attempts to restart monitoring by writing 'N' to 'enabled',
damon_hugepage_turn(false) calls damon_stop(), which eventually calls
__damon_stop(). Since ctx->kdamond is NULL, __damon_stop() returns -EPERM.

The store function receives this error and aborts without setting
enabled = false. Subsequent writes of 'Y' return immediately because
is_enabled == enable is true. This permanently locks the sysfs interface.

> +	if (err)
> +		return err;
> +
> +set_param_out:
> +	enabled = enable;
> +	return err;
> +}

[ ... ]

> +static int __init damon_hugepage_init(void)
> +{
> +	int err;
> +
> +	if (!damon_initialized()) {
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +	err = damon_modules_new_vaddr_ctx_target(&ctx, &target);
> +	if (err)
> +		goto out;
> +
> +	call_control.data = ctx;
> +
> +	/* 'enabled' has set before this function, probably via command line */
> +	if (enabled)
> +		err = damon_hugepage_turn(true);
> +
> +out:
> +	if (err && enabled)
> +		enabled = false;
> +	return err;

[Severity: Low]
Does this leak the allocated ctx and target structures if initialization fails?

If the module is enabled by default via boot parameters, and
damon_hugepage_turn(true) fails, the function returns an error without
freeing the memory dynamically allocated by
damon_modules_new_vaddr_ctx_target() for ctx and target.

> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260522145518.158910-1-gutierrez.asier@huawei-partners.com?part=3

next prev parent reply	other threads:[~2026-05-22 17:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-22 14:55 [RFC PATCH v2 0/4] mm/damon: Introduce a huge page collapsing mechanism using auto tuning gutierrez.asier
2026-05-22 14:55 ` [RFC PATCH v2 1/4] mm/damon: Generalize ctx_target creation for damon_ops_id and add vaddr support gutierrez.asier
2026-05-23 17:20   ` SeongJae Park
2026-05-22 14:55 ` [RFC PATCH v2 2/4] mm/damon: Introduce DAMOS_QUOTA_HUGEPAGE auto tuning gutierrez.asier
2026-05-22 15:45   ` sashiko-bot
2026-05-23 17:27   ` SeongJae Park
2026-05-25 15:04     ` Gutierrez Asier
2026-05-25 16:51       ` SeongJae Park
2026-05-22 14:55 ` [RFC PATCH v2 3/4] mm/damon: introduce DAMON_HUGEPAGE for hot region hugepage collapsing gutierrez.asier
2026-05-22 17:12   ` sashiko-bot [this message]
2026-05-23 17:29   ` SeongJae Park
2026-05-22 14:55 ` [RFC PATCH v2 4/4] Documentation/admin-guide/mm/damon: add DAMON-based Hugepage Management documentation gutierrez.asier
2026-05-22 17:16   ` sashiko-bot
2026-05-23 17:17 ` [RFC PATCH v2 0/4] mm/damon: Introduce a huge page collapsing mechanism using auto tuning SeongJae Park
2026-05-25 13:53   ` Gutierrez Asier
2026-05-25 16:48     ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260522171210.900B11F00A3D@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=damon@lists.linux.dev \
    --cc=gutierrez.asier@huawei-partners.com \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox