* [PATCH v2] selftests/damon: add regression test for damos_walk() vs kdamond exit race
@ 2026-05-24 10:02 Sailesh Nandanavanam
2026-05-24 18:37 ` SeongJae Park
2026-06-12 6:23 ` [PATCH v3] mm/damon: add KUnit test for walk_control_obsolete behavior Sailesh Nandanavanam
0 siblings, 2 replies; 10+ messages in thread
From: Sailesh Nandanavanam @ 2026-05-24 10:02 UTC (permalink / raw)
To: sj
Cc: shuah, damon, linux-mm, linux-kselftest, linux-kernel,
Sailesh Nandanavanam
Add a regression test that verifies damos_walk() does not deadlock
when racing with kdamond_fn() exit.
When kdamond_fn() finishes its main loop, it cancels remaining
damos_walk() requests and unsets damon_ctx->kdamond. Without the fix
in commit 33c3f6c2b48c, damos_walk() could be called right after
cancellation but before kdamond pointer unset, causing it to wait
forever for handling that never comes.
The test starts kdamond monitoring a short-lived process, waits for
the process to exit naturally triggering kdamond termination, then
rapidly calls update_schemes_tried_regions in a separate thread to
hit the race window. Using a thread with join timeout ensures the
test can detect kernel-level deadlocks where the system call blocks
in uninterruptible state.
The sysfs state path is resolved dynamically via the kdamonds object
instead of being hardcoded, and exceptions are handled specifically
as OSError rather than using a bare except block.
Fixes: 33c3f6c2b48c ("mm/damon/core: fix damos_walk() vs kdamond_fn() exit race")
Signed-off-by: Sailesh Nandanavanam <saileshnandanavanam@gmail.com>
---
tools/testing/selftests/damon/Makefile | 1 +
.../sysfs_damos_walk_kdamond_exit_race.py | 82 +++++++++++++++++++
2 files changed, 83 insertions(+)
create mode 100755 tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py
diff --git a/tools/testing/selftests/damon/Makefile b/tools/testing/selftests/damon/Makefile
index 2180c328a825..60c83d6c318e 100644
--- a/tools/testing/selftests/damon/Makefile
+++ b/tools/testing/selftests/damon/Makefile
@@ -20,6 +20,7 @@ TEST_PROGS += sysfs_update_removed_scheme_dir.sh
TEST_PROGS += sysfs_update_schemes_tried_regions_hang.py
TEST_PROGS += sysfs_memcg_path_leak.sh
TEST_PROGS += sysfs_no_op_commit_break.py
+TEST_PROGS += sysfs_damos_walk_kdamond_exit_race.py
EXTRA_CLEAN = __pycache__
diff --git a/tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py b/tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py
new file mode 100755
index 000000000000..8e8006d63926
--- /dev/null
+++ b/tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+#
+# Regression test for damos_walk() vs kdamond_fn() exit race.
+#
+# When kdamond_fn() finishes its main loop, it cancels remaining damos_walk()
+# requests and unsets damon_ctx->kdamond. If damos_walk() is called right
+# after cancellation but before kdamond pointer unset, it could wait forever
+# for handling that never comes, causing a deadlock.
+#
+# This test verifies the fix by rapidly calling update_schemes_tried_regions
+# while kdamond is naturally terminating (monitored process exits).
+# Without the fix (commit 33c3f6c2b48c), this would hang indefinitely.
+
+import os
+import subprocess
+import threading
+import time
+import _damon_sysfs
+
+def call_update(kdamond, result):
+ err = kdamond.update_schemes_tried_regions()
+ result['err'] = err
+ result['done'] = True
+
+def main():
+ proc = subprocess.Popen(['sleep', '0.3'])
+
+ kdamonds = _damon_sysfs.Kdamonds([_damon_sysfs.Kdamond(
+ contexts=[_damon_sysfs.DamonCtx(
+ ops='vaddr',
+ targets=[_damon_sysfs.DamonTarget(pid=proc.pid)],
+ schemes=[_damon_sysfs.Damos(
+ action='stat',
+ access_pattern=_damon_sysfs.DamosAccessPattern(
+ nr_accesses=[0, 200]))]
+ )]
+ )])
+
+ err = kdamonds.start()
+ if err is not None:
+ print('kdamond start failed: %s' % err)
+ exit(1)
+
+ # Wait for monitored process to die naturally
+ proc.wait()
+
+ # Rapidly call damos_walk() while kdamond is exiting
+ # Use a thread with real timeout to detect kernel-level deadlock
+ deadline = time.time() + 5
+ while time.time() < deadline:
+ result = {'done': False, 'err': None}
+ t = threading.Thread(target=call_update,
+ args=(kdamonds.kdamonds[0], result))
+ t.daemon = True
+ t.start()
+ t.join(timeout=5)
+
+ if not result['done']:
+ print('FAIL: update_schemes_tried_regions hung - '
+ 'possible damos_walk/kdamond exit race deadlock')
+ exit(1)
+
+ if result['err'] is not None:
+ # kdamond stopped cleanly - expected
+ break
+
+ # Check kdamond state via sysfs using dynamic path
+ state_path = os.path.join(
+ kdamonds.kdamonds[0].sysfs_dir(), 'state')
+ try:
+ with open(state_path) as f:
+ if f.read().strip() == 'off':
+ break
+ except OSError as e:
+ print('failed to read kdamond state: %s' % e)
+ exit(1)
+
+ print('PASS: damos_walk() vs kdamond exit race not triggered')
+
+if __name__ == '__main__':
+ main()
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2] selftests/damon: add regression test for damos_walk() vs kdamond exit race
2026-05-24 10:02 [PATCH v2] selftests/damon: add regression test for damos_walk() vs kdamond exit race Sailesh Nandanavanam
@ 2026-05-24 18:37 ` SeongJae Park
2026-06-05 7:14 ` Sailesh Nandanavanam
2026-06-12 6:23 ` [PATCH v3] mm/damon: add KUnit test for walk_control_obsolete behavior Sailesh Nandanavanam
1 sibling, 1 reply; 10+ messages in thread
From: SeongJae Park @ 2026-05-24 18:37 UTC (permalink / raw)
To: Sailesh Nandanavanam
Cc: SeongJae Park, shuah, damon, linux-mm, linux-kselftest,
linux-kernel
Hello Sailesh,
On Sun, 24 May 2026 15:32:58 +0530 Sailesh Nandanavanam <saileshnandanavanam@gmail.com> wrote:
> Add a regression test that verifies damos_walk() does not deadlock
> when racing with kdamond_fn() exit.
>
> When kdamond_fn() finishes its main loop, it cancels remaining
> damos_walk() requests and unsets damon_ctx->kdamond. Without the fix
> in commit 33c3f6c2b48c, damos_walk() could be called right after
Please use more common commit description format that has the commit subject.
E.g., commit 33c3f6c2b48c ("mm/damon/core: fix damos_walk() vs kdamond_fn()
exit race").
> cancellation but before kdamond pointer unset, causing it to wait
> forever for handling that never comes.
>
> The test starts kdamond monitoring a short-lived process, waits for
> the process to exit naturally triggering kdamond termination, then
> rapidly calls update_schemes_tried_regions in a separate thread to
> hit the race window. Using a thread with join timeout ensures the
> test can detect kernel-level deadlocks where the system call blocks
> in uninterruptible state.
>
> The sysfs state path is resolved dynamically via the kdamonds object
> instead of being hardcoded, and exceptions are handled specifically
> as OSError rather than using a bare except block.
Thank you for this patch. But, is this test reliable? I ran the test more
than 100 times on my system running a kernel that has commit 33c3f6c2b48c is
reverted. But test always passed.
>
> Fixes: 33c3f6c2b48c ("mm/damon/core: fix damos_walk() vs kdamond_fn() exit race")
This is not fixing a bug in the commit, so the above 'Fixes:' tag is
inappropriate and may only confuse people.
> Signed-off-by: Sailesh Nandanavanam <saileshnandanavanam@gmail.com>
> ---
From next time, please add changelog on the commentary area [1], with links to
the previous revisions.
I was able to find the previous version on the mailing list [2], so putting the
link here for others.
Also, before sending a new version, please share your revisioning plan and give
time (about a daay) for others to comment about.
> tools/testing/selftests/damon/Makefile | 1 +
> .../sysfs_damos_walk_kdamond_exit_race.py | 82 +++++++++++++++++++
> 2 files changed, 83 insertions(+)
> create mode 100755 tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py
>
> diff --git a/tools/testing/selftests/damon/Makefile b/tools/testing/selftests/damon/Makefile
> index 2180c328a825..60c83d6c318e 100644
> --- a/tools/testing/selftests/damon/Makefile
> +++ b/tools/testing/selftests/damon/Makefile
> @@ -20,6 +20,7 @@ TEST_PROGS += sysfs_update_removed_scheme_dir.sh
> TEST_PROGS += sysfs_update_schemes_tried_regions_hang.py
> TEST_PROGS += sysfs_memcg_path_leak.sh
> TEST_PROGS += sysfs_no_op_commit_break.py
> +TEST_PROGS += sysfs_damos_walk_kdamond_exit_race.py
>
> EXTRA_CLEAN = __pycache__
>
> diff --git a/tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py b/tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py
> new file mode 100755
> index 000000000000..8e8006d63926
> --- /dev/null
> +++ b/tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py
> @@ -0,0 +1,82 @@
> +#!/usr/bin/env python3
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Regression test for damos_walk() vs kdamond_fn() exit race.
> +#
> +# When kdamond_fn() finishes its main loop, it cancels remaining damos_walk()
> +# requests and unsets damon_ctx->kdamond. If damos_walk() is called right
> +# after cancellation but before kdamond pointer unset, it could wait forever
> +# for handling that never comes, causing a deadlock.
> +#
> +# This test verifies the fix by rapidly calling update_schemes_tried_regions
> +# while kdamond is naturally terminating (monitored process exits).
> +# Without the fix (commit 33c3f6c2b48c), this would hang indefinitely.
> +
> +import os
> +import subprocess
> +import threading
> +import time
> +import _damon_sysfs
Let's add a blank line before _damon_sysfs import, to be consistent with other
test files, like sysfs_update_schemes_tried_regions_hang.py.
> +
> +def call_update(kdamond, result):
> + err = kdamond.update_schemes_tried_regions()
> + result['err'] = err
> + result['done'] = True
> +
> +def main():
> + proc = subprocess.Popen(['sleep', '0.3'])
> +
> + kdamonds = _damon_sysfs.Kdamonds([_damon_sysfs.Kdamond(
> + contexts=[_damon_sysfs.DamonCtx(
> + ops='vaddr',
> + targets=[_damon_sysfs.DamonTarget(pid=proc.pid)],
> + schemes=[_damon_sysfs.Damos(
> + action='stat',
> + access_pattern=_damon_sysfs.DamosAccessPattern(
> + nr_accesses=[0, 200]))]
> + )]
> + )])
> +
> + err = kdamonds.start()
> + if err is not None:
> + print('kdamond start failed: %s' % err)
> + exit(1)
> +
> + # Wait for monitored process to die naturally
> + proc.wait()
> +
> + # Rapidly call damos_walk() while kdamond is exiting
> + # Use a thread with real timeout to detect kernel-level deadlock
> + deadline = time.time() + 5
> + while time.time() < deadline:
> + result = {'done': False, 'err': None}
> + t = threading.Thread(target=call_update,
> + args=(kdamonds.kdamonds[0], result))
> + t.daemon = True
> + t.start()
> + t.join(timeout=5)
I'm not sure if this is reliable to trigger the exact race. As I mentioned
abovely, I tried this test more than 100 times on a kernel that having the fix
reverted, but I was unable to make the test fail. If it is that unreliable,
I'm not very sure if having this test is beneficial or just make people
confused.
If the test has no false positive, maybe having this make sense to
opportunistically finding the bug. But I think the 5 seconds timeout is still
not very reliable on some case, and therefore it seems false positive test
failure is available. If that is correct, I think having this test might only
confuse people.
I think having damos_walk() kunit test for its functionalities including the
walk_control_obsolete might make more sense.
> +
> + if not result['done']:
> + print('FAIL: update_schemes_tried_regions hung - '
> + 'possible damos_walk/kdamond exit race deadlock')
> + exit(1)
> +
> + if result['err'] is not None:
> + # kdamond stopped cleanly - expected
> + break
Is the above if condition correct? Could you please explain why having an
error here is expected?
> +
> + # Check kdamond state via sysfs using dynamic path
> + state_path = os.path.join(
> + kdamonds.kdamonds[0].sysfs_dir(), 'state')
> + try:
> + with open(state_path) as f:
> + if f.read().strip() == 'off':
> + break
> + except OSError as e:
> + print('failed to read kdamond state: %s' % e)
> + exit(1)
> +
> + print('PASS: damos_walk() vs kdamond exit race not triggered')
> +
> +if __name__ == '__main__':
> + main()
> --
> 2.34.1
[1] https://docs.kernel.org/process/submitting-patches.html#commentary
[2] https://lore.kernel.org/20260524091812.35283-1-saileshnandanavanam@gmail.com
Thanks,
SJ
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] selftests/damon: add regression test for damos_walk() vs kdamond exit race
2026-05-24 18:37 ` SeongJae Park
@ 2026-06-05 7:14 ` Sailesh Nandanavanam
2026-06-05 9:55 ` Sailesh Nandanavanam
2026-06-06 0:36 ` SeongJae Park
0 siblings, 2 replies; 10+ messages in thread
From: Sailesh Nandanavanam @ 2026-06-05 7:14 UTC (permalink / raw)
To: SeongJae Park; +Cc: shuah, damon, linux-mm, linux-kselftest, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 9852 bytes --]
Hi SeongJae,
Apologies for the delayed response.
Thank you for the detailed review and for actually testing the patch on a
kernel with the fix reverted.
You are right that the userspace selftest cannot reliably trigger this
race. The race window between kdamond cancelling requests and unsetting the
pointer is microseconds wide. Since userspace calls have system call
overhead, they cannot hit that window reliably — which is why the test
always passed even on a broken kernel.
Regarding the error condition question:
When kdamond stops naturally, any subsequent write to the state file (which
update_schemes_tried_regions() does internally) fails because kdamond is no
longer running. So receiving an error indicates kdamond exited
without deadlock — the expected success condition. I should have explained
this more clearly in the comment.
I agree that a KUnit test is the right approach since it runs inside the
kernel and can directly set walk_control_obsolete to simulate the race
condition reliably without timing dependencies.
I will drop the userspace selftest approach and write a KUnit test for
damos_walk() functionality including walk_control_obsolete. I will verify
it fails on a kernel with commit 33c3f6c2b48c ("mm/damon/core: fix
damos_walk() vs kdamond_fn() exit race") reverted and passes with the fix
present before sending v3.
Regarding other feedback:
- Will use correct commit description format with subject in brackets
- Will remove the Fixes: tag
- Will add blank line before _damon_sysfs import
- Will add changelog and link previous versions
- Will share revision plan and wait at least one day
before sending new version
Please give me some time to study mm/damon/tests/core-kunit.h and implement
this properly.
Thanks,
Sailesh Nandanavanam
On Mon, May 25, 2026 at 12:08 AM SeongJae Park <sj@kernel.org> wrote:
> Hello Sailesh,
>
> On Sun, 24 May 2026 15:32:58 +0530 Sailesh Nandanavanam <
> saileshnandanavanam@gmail.com> wrote:
>
> > Add a regression test that verifies damos_walk() does not deadlock
> > when racing with kdamond_fn() exit.
> >
> > When kdamond_fn() finishes its main loop, it cancels remaining
> > damos_walk() requests and unsets damon_ctx->kdamond. Without the fix
> > in commit 33c3f6c2b48c, damos_walk() could be called right after
>
> Please use more common commit description format that has the commit
> subject.
> E.g., commit 33c3f6c2b48c ("mm/damon/core: fix damos_walk() vs kdamond_fn()
> exit race").
>
> > cancellation but before kdamond pointer unset, causing it to wait
> > forever for handling that never comes.
> >
> > The test starts kdamond monitoring a short-lived process, waits for
> > the process to exit naturally triggering kdamond termination, then
> > rapidly calls update_schemes_tried_regions in a separate thread to
> > hit the race window. Using a thread with join timeout ensures the
> > test can detect kernel-level deadlocks where the system call blocks
> > in uninterruptible state.
> >
> > The sysfs state path is resolved dynamically via the kdamonds object
> > instead of being hardcoded, and exceptions are handled specifically
> > as OSError rather than using a bare except block.
>
> Thank you for this patch. But, is this test reliable? I ran the test more
> than 100 times on my system running a kernel that has commit 33c3f6c2b48c
> is
> reverted. But test always passed.
>
> >
> > Fixes: 33c3f6c2b48c ("mm/damon/core: fix damos_walk() vs kdamond_fn()
> exit race")
>
> This is not fixing a bug in the commit, so the above 'Fixes:' tag is
> inappropriate and may only confuse people.
>
> > Signed-off-by: Sailesh Nandanavanam <saileshnandanavanam@gmail.com>
> > ---
>
> From next time, please add changelog on the commentary area [1], with
> links to
> the previous revisions.
>
> I was able to find the previous version on the mailing list [2], so
> putting the
> link here for others.
>
> Also, before sending a new version, please share your revisioning plan and
> give
> time (about a daay) for others to comment about.
>
> > tools/testing/selftests/damon/Makefile | 1 +
> > .../sysfs_damos_walk_kdamond_exit_race.py | 82 +++++++++++++++++++
> > 2 files changed, 83 insertions(+)
> > create mode 100755
> tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py
> >
> > diff --git a/tools/testing/selftests/damon/Makefile
> b/tools/testing/selftests/damon/Makefile
> > index 2180c328a825..60c83d6c318e 100644
> > --- a/tools/testing/selftests/damon/Makefile
> > +++ b/tools/testing/selftests/damon/Makefile
> > @@ -20,6 +20,7 @@ TEST_PROGS += sysfs_update_removed_scheme_dir.sh
> > TEST_PROGS += sysfs_update_schemes_tried_regions_hang.py
> > TEST_PROGS += sysfs_memcg_path_leak.sh
> > TEST_PROGS += sysfs_no_op_commit_break.py
> > +TEST_PROGS += sysfs_damos_walk_kdamond_exit_race.py
> >
> > EXTRA_CLEAN = __pycache__
> >
> > diff --git
> a/tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py
> b/tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py
> > new file mode 100755
> > index 000000000000..8e8006d63926
> > --- /dev/null
> > +++ b/tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py
> > @@ -0,0 +1,82 @@
> > +#!/usr/bin/env python3
> > +# SPDX-License-Identifier: GPL-2.0
> > +#
> > +# Regression test for damos_walk() vs kdamond_fn() exit race.
> > +#
> > +# When kdamond_fn() finishes its main loop, it cancels remaining
> damos_walk()
> > +# requests and unsets damon_ctx->kdamond. If damos_walk() is called
> right
> > +# after cancellation but before kdamond pointer unset, it could wait
> forever
> > +# for handling that never comes, causing a deadlock.
> > +#
> > +# This test verifies the fix by rapidly calling
> update_schemes_tried_regions
> > +# while kdamond is naturally terminating (monitored process exits).
> > +# Without the fix (commit 33c3f6c2b48c), this would hang indefinitely.
> > +
> > +import os
> > +import subprocess
> > +import threading
> > +import time
> > +import _damon_sysfs
>
> Let's add a blank line before _damon_sysfs import, to be consistent with
> other
> test files, like sysfs_update_schemes_tried_regions_hang.py.
>
> > +
> > +def call_update(kdamond, result):
> > + err = kdamond.update_schemes_tried_regions()
> > + result['err'] = err
> > + result['done'] = True
> > +
> > +def main():
> > + proc = subprocess.Popen(['sleep', '0.3'])
> > +
> > + kdamonds = _damon_sysfs.Kdamonds([_damon_sysfs.Kdamond(
> > + contexts=[_damon_sysfs.DamonCtx(
> > + ops='vaddr',
> > + targets=[_damon_sysfs.DamonTarget(pid=proc.pid)],
> > + schemes=[_damon_sysfs.Damos(
> > + action='stat',
> > + access_pattern=_damon_sysfs.DamosAccessPattern(
> > + nr_accesses=[0, 200]))]
> > + )]
> > + )])
> > +
> > + err = kdamonds.start()
> > + if err is not None:
> > + print('kdamond start failed: %s' % err)
> > + exit(1)
> > +
> > + # Wait for monitored process to die naturally
> > + proc.wait()
> > +
> > + # Rapidly call damos_walk() while kdamond is exiting
> > + # Use a thread with real timeout to detect kernel-level deadlock
> > + deadline = time.time() + 5
> > + while time.time() < deadline:
> > + result = {'done': False, 'err': None}
> > + t = threading.Thread(target=call_update,
> > + args=(kdamonds.kdamonds[0], result))
> > + t.daemon = True
> > + t.start()
> > + t.join(timeout=5)
>
> I'm not sure if this is reliable to trigger the exact race. As I mentioned
> abovely, I tried this test more than 100 times on a kernel that having the
> fix
> reverted, but I was unable to make the test fail. If it is that
> unreliable,
> I'm not very sure if having this test is beneficial or just make people
> confused.
>
> If the test has no false positive, maybe having this make sense to
> opportunistically finding the bug. But I think the 5 seconds timeout is
> still
> not very reliable on some case, and therefore it seems false positive test
> failure is available. If that is correct, I think having this test might
> only
> confuse people.
>
> I think having damos_walk() kunit test for its functionalities including
> the
> walk_control_obsolete might make more sense.
>
> > +
> > + if not result['done']:
> > + print('FAIL: update_schemes_tried_regions hung - '
> > + 'possible damos_walk/kdamond exit race deadlock')
> > + exit(1)
> > +
> > + if result['err'] is not None:
> > + # kdamond stopped cleanly - expected
> > + break
>
> Is the above if condition correct? Could you please explain why having an
> error here is expected?
>
> > +
> > + # Check kdamond state via sysfs using dynamic path
> > + state_path = os.path.join(
> > + kdamonds.kdamonds[0].sysfs_dir(), 'state')
> > + try:
> > + with open(state_path) as f:
> > + if f.read().strip() == 'off':
> > + break
> > + except OSError as e:
> > + print('failed to read kdamond state: %s' % e)
> > + exit(1)
> > +
> > + print('PASS: damos_walk() vs kdamond exit race not triggered')
> > +
> > +if __name__ == '__main__':
> > + main()
> > --
> > 2.34.1
>
> [1] https://docs.kernel.org/process/submitting-patches.html#commentary
> [2]
> https://lore.kernel.org/20260524091812.35283-1-saileshnandanavanam@gmail.com
>
>
> Thanks,
> SJ
>
[-- Attachment #2: Type: text/html, Size: 11785 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] selftests/damon: add regression test for damos_walk() vs kdamond exit race
2026-06-05 7:14 ` Sailesh Nandanavanam
@ 2026-06-05 9:55 ` Sailesh Nandanavanam
2026-06-06 0:36 ` SeongJae Park
1 sibling, 0 replies; 10+ messages in thread
From: Sailesh Nandanavanam @ 2026-06-05 9:55 UTC (permalink / raw)
To: SeongJae Park; +Cc: shuah, damon, linux-mm, linux-kselftest, linux-kernel
Hi SeongJae,
Apologies for the delayed response.
Thank you for the detailed review and for actually testing the patch
on a kernel with the fix reverted.
You are right that the userspace selftest cannot reliably trigger this
race. The race window between kdamond cancelling requests and
unsetting the pointer is microseconds wide. Since userspace calls have
system call overhead, they cannot hit that window reliably — which is
why the test always passed even on a broken kernel.
Regarding the error condition question:
When kdamond stops naturally, any subsequent write to the state file
(which update_schemes_tried_regions() does internally) fails because
kdamond is no longer running. So receiving an error indicates kdamond
exited without deadlock — the expected success condition. I should
have explained this more clearly in the comment.
I agree that a KUnit test is the right approach since it runs inside
the kernel and can directly set walk_control_obsolete to simulate the
race condition reliably without timing dependencies.
I will drop the userspace selftest approach and write a KUnit test for
damos_walk() functionality including walk_control_obsolete. I will
verify it fails on a kernel with commit 33c3f6c2b48c ("mm/damon/core:
fix damos_walk() vs kdamond_fn() exit race") reverted and passes with
the fix present before sending v3.
Regarding other feedback:
- Will use correct commit description format with subject in brackets
- Will remove the Fixes: tag
- Will add blank line before _damon_sysfs import
- Will add changelog and link previous versions
- Will share revision plan and wait at least one day
before sending new version
Please give me some time to study mm/damon/tests/core-kunit.h and
implement this properly.
Thanks,
Sailesh Nandanavanam
On Fri, Jun 5, 2026 at 12:44 PM Sailesh Nandanavanam
<saileshnandanavanam@gmail.com> wrote:
>
> Hi SeongJae,
>
> Apologies for the delayed response.
>
> Thank you for the detailed review and for actually testing the patch on a kernel with the fix reverted.
>
> You are right that the userspace selftest cannot reliably trigger this race. The race window between kdamond cancelling requests and unsetting the pointer is microseconds wide. Since userspace calls have system call overhead, they cannot hit that window reliably — which is why the test always passed even on a broken kernel.
>
> Regarding the error condition question:
>
> When kdamond stops naturally, any subsequent write to the state file (which update_schemes_tried_regions() does internally) fails because kdamond is no longer running. So receiving an error indicates kdamond exited without deadlock — the expected success condition. I should have explained this more clearly in the comment.
>
> I agree that a KUnit test is the right approach since it runs inside the kernel and can directly set walk_control_obsolete to simulate the race condition reliably without timing dependencies.
>
> I will drop the userspace selftest approach and write a KUnit test for damos_walk() functionality including walk_control_obsolete. I will verify it fails on a kernel with commit 33c3f6c2b48c ("mm/damon/core: fix damos_walk() vs kdamond_fn() exit race") reverted and passes with the fix present before sending v3.
>
> Regarding other feedback:
> - Will use correct commit description format with subject in brackets
> - Will remove the Fixes: tag
> - Will add blank line before _damon_sysfs import
> - Will add changelog and link previous versions
> - Will share revision plan and wait at least one day
> before sending new version
>
> Please give me some time to study mm/damon/tests/core-kunit.h and implement this properly.
>
> Thanks,
> Sailesh Nandanavanam
>
> On Mon, May 25, 2026 at 12:08 AM SeongJae Park <sj@kernel.org> wrote:
>>
>> Hello Sailesh,
>>
>> On Sun, 24 May 2026 15:32:58 +0530 Sailesh Nandanavanam <saileshnandanavanam@gmail.com> wrote:
>>
>> > Add a regression test that verifies damos_walk() does not deadlock
>> > when racing with kdamond_fn() exit.
>> >
>> > When kdamond_fn() finishes its main loop, it cancels remaining
>> > damos_walk() requests and unsets damon_ctx->kdamond. Without the fix
>> > in commit 33c3f6c2b48c, damos_walk() could be called right after
>>
>> Please use more common commit description format that has the commit subject.
>> E.g., commit 33c3f6c2b48c ("mm/damon/core: fix damos_walk() vs kdamond_fn()
>> exit race").
>>
>> > cancellation but before kdamond pointer unset, causing it to wait
>> > forever for handling that never comes.
>> >
>> > The test starts kdamond monitoring a short-lived process, waits for
>> > the process to exit naturally triggering kdamond termination, then
>> > rapidly calls update_schemes_tried_regions in a separate thread to
>> > hit the race window. Using a thread with join timeout ensures the
>> > test can detect kernel-level deadlocks where the system call blocks
>> > in uninterruptible state.
>> >
>> > The sysfs state path is resolved dynamically via the kdamonds object
>> > instead of being hardcoded, and exceptions are handled specifically
>> > as OSError rather than using a bare except block.
>>
>> Thank you for this patch. But, is this test reliable? I ran the test more
>> than 100 times on my system running a kernel that has commit 33c3f6c2b48c is
>> reverted. But test always passed.
>>
>> >
>> > Fixes: 33c3f6c2b48c ("mm/damon/core: fix damos_walk() vs kdamond_fn() exit race")
>>
>> This is not fixing a bug in the commit, so the above 'Fixes:' tag is
>> inappropriate and may only confuse people.
>>
>> > Signed-off-by: Sailesh Nandanavanam <saileshnandanavanam@gmail.com>
>> > ---
>>
>> From next time, please add changelog on the commentary area [1], with links to
>> the previous revisions.
>>
>> I was able to find the previous version on the mailing list [2], so putting the
>> link here for others.
>>
>> Also, before sending a new version, please share your revisioning plan and give
>> time (about a daay) for others to comment about.
>>
>> > tools/testing/selftests/damon/Makefile | 1 +
>> > .../sysfs_damos_walk_kdamond_exit_race.py | 82 +++++++++++++++++++
>> > 2 files changed, 83 insertions(+)
>> > create mode 100755 tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py
>> >
>> > diff --git a/tools/testing/selftests/damon/Makefile b/tools/testing/selftests/damon/Makefile
>> > index 2180c328a825..60c83d6c318e 100644
>> > --- a/tools/testing/selftests/damon/Makefile
>> > +++ b/tools/testing/selftests/damon/Makefile
>> > @@ -20,6 +20,7 @@ TEST_PROGS += sysfs_update_removed_scheme_dir.sh
>> > TEST_PROGS += sysfs_update_schemes_tried_regions_hang.py
>> > TEST_PROGS += sysfs_memcg_path_leak.sh
>> > TEST_PROGS += sysfs_no_op_commit_break.py
>> > +TEST_PROGS += sysfs_damos_walk_kdamond_exit_race.py
>> >
>> > EXTRA_CLEAN = __pycache__
>> >
>> > diff --git a/tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py b/tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py
>> > new file mode 100755
>> > index 000000000000..8e8006d63926
>> > --- /dev/null
>> > +++ b/tools/testing/selftests/damon/sysfs_damos_walk_kdamond_exit_race.py
>> > @@ -0,0 +1,82 @@
>> > +#!/usr/bin/env python3
>> > +# SPDX-License-Identifier: GPL-2.0
>> > +#
>> > +# Regression test for damos_walk() vs kdamond_fn() exit race.
>> > +#
>> > +# When kdamond_fn() finishes its main loop, it cancels remaining damos_walk()
>> > +# requests and unsets damon_ctx->kdamond. If damos_walk() is called right
>> > +# after cancellation but before kdamond pointer unset, it could wait forever
>> > +# for handling that never comes, causing a deadlock.
>> > +#
>> > +# This test verifies the fix by rapidly calling update_schemes_tried_regions
>> > +# while kdamond is naturally terminating (monitored process exits).
>> > +# Without the fix (commit 33c3f6c2b48c), this would hang indefinitely.
>> > +
>> > +import os
>> > +import subprocess
>> > +import threading
>> > +import time
>> > +import _damon_sysfs
>>
>> Let's add a blank line before _damon_sysfs import, to be consistent with other
>> test files, like sysfs_update_schemes_tried_regions_hang.py.
>>
>> > +
>> > +def call_update(kdamond, result):
>> > + err = kdamond.update_schemes_tried_regions()
>> > + result['err'] = err
>> > + result['done'] = True
>> > +
>> > +def main():
>> > + proc = subprocess.Popen(['sleep', '0.3'])
>> > +
>> > + kdamonds = _damon_sysfs.Kdamonds([_damon_sysfs.Kdamond(
>> > + contexts=[_damon_sysfs.DamonCtx(
>> > + ops='vaddr',
>> > + targets=[_damon_sysfs.DamonTarget(pid=proc.pid)],
>> > + schemes=[_damon_sysfs.Damos(
>> > + action='stat',
>> > + access_pattern=_damon_sysfs.DamosAccessPattern(
>> > + nr_accesses=[0, 200]))]
>> > + )]
>> > + )])
>> > +
>> > + err = kdamonds.start()
>> > + if err is not None:
>> > + print('kdamond start failed: %s' % err)
>> > + exit(1)
>> > +
>> > + # Wait for monitored process to die naturally
>> > + proc.wait()
>> > +
>> > + # Rapidly call damos_walk() while kdamond is exiting
>> > + # Use a thread with real timeout to detect kernel-level deadlock
>> > + deadline = time.time() + 5
>> > + while time.time() < deadline:
>> > + result = {'done': False, 'err': None}
>> > + t = threading.Thread(target=call_update,
>> > + args=(kdamonds.kdamonds[0], result))
>> > + t.daemon = True
>> > + t.start()
>> > + t.join(timeout=5)
>>
>> I'm not sure if this is reliable to trigger the exact race. As I mentioned
>> abovely, I tried this test more than 100 times on a kernel that having the fix
>> reverted, but I was unable to make the test fail. If it is that unreliable,
>> I'm not very sure if having this test is beneficial or just make people
>> confused.
>>
>> If the test has no false positive, maybe having this make sense to
>> opportunistically finding the bug. But I think the 5 seconds timeout is still
>> not very reliable on some case, and therefore it seems false positive test
>> failure is available. If that is correct, I think having this test might only
>> confuse people.
>>
>> I think having damos_walk() kunit test for its functionalities including the
>> walk_control_obsolete might make more sense.
>>
>> > +
>> > + if not result['done']:
>> > + print('FAIL: update_schemes_tried_regions hung - '
>> > + 'possible damos_walk/kdamond exit race deadlock')
>> > + exit(1)
>> > +
>> > + if result['err'] is not None:
>> > + # kdamond stopped cleanly - expected
>> > + break
>>
>> Is the above if condition correct? Could you please explain why having an
>> error here is expected?
>>
>> > +
>> > + # Check kdamond state via sysfs using dynamic path
>> > + state_path = os.path.join(
>> > + kdamonds.kdamonds[0].sysfs_dir(), 'state')
>> > + try:
>> > + with open(state_path) as f:
>> > + if f.read().strip() == 'off':
>> > + break
>> > + except OSError as e:
>> > + print('failed to read kdamond state: %s' % e)
>> > + exit(1)
>> > +
>> > + print('PASS: damos_walk() vs kdamond exit race not triggered')
>> > +
>> > +if __name__ == '__main__':
>> > + main()
>> > --
>> > 2.34.1
>>
>> [1] https://docs.kernel.org/process/submitting-patches.html#commentary
>> [2] https://lore.kernel.org/20260524091812.35283-1-saileshnandanavanam@gmail.com
>>
>>
>> Thanks,
>> SJ
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] selftests/damon: add regression test for damos_walk() vs kdamond exit race
2026-06-05 7:14 ` Sailesh Nandanavanam
2026-06-05 9:55 ` Sailesh Nandanavanam
@ 2026-06-06 0:36 ` SeongJae Park
2026-06-11 20:20 ` Sailesh Nandanavanam
1 sibling, 1 reply; 10+ messages in thread
From: SeongJae Park @ 2026-06-06 0:36 UTC (permalink / raw)
To: Sailesh Nandanavanam
Cc: SeongJae Park, shuah, damon, linux-mm, linux-kselftest,
linux-kernel
On Fri, 5 Jun 2026 12:44:50 +0530 Sailesh Nandanavanam <saileshnandanavanam@gmail.com> wrote:
> Hi SeongJae,
>
> Apologies for the delayed response.
No worry!
[...]
> I agree that a KUnit test is the right approach since it runs inside the
> kernel and can directly set walk_control_obsolete to simulate the race
> condition reliably without timing dependencies.
>
> I will drop the userspace selftest approach and write a KUnit test for
> damos_walk() functionality including walk_control_obsolete. I will verify
> it fails on a kernel with commit 33c3f6c2b48c ("mm/damon/core: fix
> damos_walk() vs kdamond_fn() exit race") reverted and passes with the fix
> present before sending v3.
>
> Regarding other feedback:
> - Will use correct commit description format with subject in brackets
> - Will remove the Fixes: tag
> - Will add blank line before _damon_sysfs import
> - Will add changelog and link previous versions
> - Will share revision plan and wait at least one day
> before sending new version
Sounds good. Please consider using in-line reply [1] from the next time,
though!
>
> Please give me some time to study mm/damon/tests/core-kunit.h and implement
> this properly.
No rush, take your time.
[1] https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] selftests/damon: add regression test for damos_walk() vs kdamond exit race
2026-06-06 0:36 ` SeongJae Park
@ 2026-06-11 20:20 ` Sailesh Nandanavanam
2026-06-11 20:36 ` Sailesh Nandanavanam
0 siblings, 1 reply; 10+ messages in thread
From: Sailesh Nandanavanam @ 2026-06-11 20:20 UTC (permalink / raw)
To: SeongJae Park; +Cc: shuah, damon, linux-mm, linux-kselftest, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2995 bytes --]
Hello SeongJae,
Noted on inline replies - I'll follow that format from the next response.
Following up on this thread with progress.
As discussed, I dropped the userspace selftest approach since it cannot
reliably trigger the microsecond-wide race window.
I wrote a KUnit test instead, added to mm/damon/tests/core-kunit.h:
- Creates a damon_ctx
- Sets ctx->walk_control_obsolete = true under walk_control_lock,
simulating the state just before kdamond_fn() exits
- Calls damos_walk() and verifies it returns -ECANCELED immediately
I built and ran this on a kernel with commit 33c3f6c2b48c ("mm/damon/core:
fix damos_walk() vs kdamond_fn() exit race") present, and the test passes
along with all other 28 existing damon KUnit tests:
ok 29 damon_test_walk_control_obsolete
# damon: pass:29 fail:0 skip:0 total:29
Since walk_control_obsolete is a field introduced by this fix, a kernel
with the fix reverted would not compile this test (the field would not
exist). So the usual "fails before fix, passes after fix" verification does
not apply the same way here - the test directly validates the new
field/behavior introduced by the fix.
If this approach looks reasonable, I plan to send v3 with this KUnit test
(replacing the userspace selftest from v1/v2), along with the commit
message format and other minor fixes from your earlier review.
Please let me know if you'd like any changes before I send v3.
Thanks,
Sailesh
On Sat, Jun 6, 2026 at 6:06 AM SeongJae Park <sj@kernel.org> wrote:
> On Fri, 5 Jun 2026 12:44:50 +0530 Sailesh Nandanavanam <
> saileshnandanavanam@gmail.com> wrote:
>
> > Hi SeongJae,
> >
> > Apologies for the delayed response.
>
> No worry!
>
> [...]
> > I agree that a KUnit test is the right approach since it runs inside the
> > kernel and can directly set walk_control_obsolete to simulate the race
> > condition reliably without timing dependencies.
> >
> > I will drop the userspace selftest approach and write a KUnit test for
> > damos_walk() functionality including walk_control_obsolete. I will verify
> > it fails on a kernel with commit 33c3f6c2b48c ("mm/damon/core: fix
> > damos_walk() vs kdamond_fn() exit race") reverted and passes with the fix
> > present before sending v3.
> >
> > Regarding other feedback:
> > - Will use correct commit description format with subject in brackets
> > - Will remove the Fixes: tag
> > - Will add blank line before _damon_sysfs import
> > - Will add changelog and link previous versions
> > - Will share revision plan and wait at least one day
> > before sending new version
>
> Sounds good. Please consider using in-line reply [1] from the next time,
> though!
>
> >
> > Please give me some time to study mm/damon/tests/core-kunit.h and
> implement
> > this properly.
>
> No rush, take your time.
>
> [1]
> https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
>
>
> Thanks,
> SJ
>
> [...]
>
[-- Attachment #2: Type: text/html, Size: 3734 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] selftests/damon: add regression test for damos_walk() vs kdamond exit race
2026-06-11 20:20 ` Sailesh Nandanavanam
@ 2026-06-11 20:36 ` Sailesh Nandanavanam
2026-06-12 0:11 ` SeongJae Park
0 siblings, 1 reply; 10+ messages in thread
From: Sailesh Nandanavanam @ 2026-06-11 20:36 UTC (permalink / raw)
To: SeongJae Park; +Cc: shuah, damon, linux-mm, linux-kselftest, linux-kernel
Hello SeongJae,
Noted on inline replies - I'll follow that format from the next response.
Following up on this thread with progress.
As discussed, I dropped the userspace selftest approach since it
cannot reliably trigger the microsecond-wide race window.
I wrote a KUnit test instead, added to mm/damon/tests/core-kunit.h:
- Creates a damon_ctx
- Sets ctx->walk_control_obsolete = true under walk_control_lock,
simulating the state just before kdamond_fn() exits
- Calls damos_walk() and verifies it returns -ECANCELED immediately
I built and ran this on a kernel with commit 33c3f6c2b48c
("mm/damon/core: fix damos_walk() vs kdamond_fn() exit race") present,
and the test passes along with all other 28 existing damon KUnit
tests:
ok 29 damon_test_walk_control_obsolete
# damon: pass:29 fail:0 skip:0 total:29
Since walk_control_obsolete is a field introduced by this fix, a
kernel with the fix reverted would not compile this test (the field
would not exist). So the usual "fails before fix, passes after fix"
verification does not apply the same way here - the test directly
validates the new field/behavior introduced by the fix.
If this approach looks reasonable, I plan to send v3 with this KUnit
test (replacing the userspace selftest from v1/v2), along with the
commit message format and other minor fixes from your earlier review.
Please let me know if you'd like any changes before I send v3.
Thanks,
Sailesh Nandanavanam
On Fri, 12 Jun, 2026, 01:50 Sailesh Nandanavanam,
<saileshnandanavanam@gmail.com> wrote:
>
> Hello SeongJae,
>
> Noted on inline replies - I'll follow that format from the next response.
>
> Following up on this thread with progress.
>
> As discussed, I dropped the userspace selftest approach since it cannot reliably trigger the microsecond-wide race window.
>
> I wrote a KUnit test instead, added to mm/damon/tests/core-kunit.h:
>
> - Creates a damon_ctx
> - Sets ctx->walk_control_obsolete = true under walk_control_lock, simulating the state just before kdamond_fn() exits
> - Calls damos_walk() and verifies it returns -ECANCELED immediately
>
> I built and ran this on a kernel with commit 33c3f6c2b48c ("mm/damon/core: fix damos_walk() vs kdamond_fn() exit race") present, and the test passes along with all other 28 existing damon KUnit tests:
>
> ok 29 damon_test_walk_control_obsolete
> # damon: pass:29 fail:0 skip:0 total:29
>
> Since walk_control_obsolete is a field introduced by this fix, a kernel with the fix reverted would not compile this test (the field would not exist). So the usual "fails before fix, passes after fix" verification does not apply the same way here - the test directly validates the new field/behavior introduced by the fix.
>
> If this approach looks reasonable, I plan to send v3 with this KUnit test (replacing the userspace selftest from v1/v2), along with the commit message format and other minor fixes from your earlier review.
>
> Please let me know if you'd like any changes before I send v3.
>
> Thanks,
> Sailesh
>
> On Sat, Jun 6, 2026 at 6:06 AM SeongJae Park <sj@kernel.org> wrote:
>>
>> On Fri, 5 Jun 2026 12:44:50 +0530 Sailesh Nandanavanam <saileshnandanavanam@gmail.com> wrote:
>>
>> > Hi SeongJae,
>> >
>> > Apologies for the delayed response.
>>
>> No worry!
>>
>> [...]
>> > I agree that a KUnit test is the right approach since it runs inside the
>> > kernel and can directly set walk_control_obsolete to simulate the race
>> > condition reliably without timing dependencies.
>> >
>> > I will drop the userspace selftest approach and write a KUnit test for
>> > damos_walk() functionality including walk_control_obsolete. I will verify
>> > it fails on a kernel with commit 33c3f6c2b48c ("mm/damon/core: fix
>> > damos_walk() vs kdamond_fn() exit race") reverted and passes with the fix
>> > present before sending v3.
>> >
>> > Regarding other feedback:
>> > - Will use correct commit description format with subject in brackets
>> > - Will remove the Fixes: tag
>> > - Will add blank line before _damon_sysfs import
>> > - Will add changelog and link previous versions
>> > - Will share revision plan and wait at least one day
>> > before sending new version
>>
>> Sounds good. Please consider using in-line reply [1] from the next time,
>> though!
>>
>> >
>> > Please give me some time to study mm/damon/tests/core-kunit.h and implement
>> > this properly.
>>
>> No rush, take your time.
>>
>> [1] https://subspace.kernel.org/etiquette.html#do-not-top-post-when-replying
>>
>>
>> Thanks,
>> SJ
>>
>> [...]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] selftests/damon: add regression test for damos_walk() vs kdamond exit race
2026-06-11 20:36 ` Sailesh Nandanavanam
@ 2026-06-12 0:11 ` SeongJae Park
0 siblings, 0 replies; 10+ messages in thread
From: SeongJae Park @ 2026-06-12 0:11 UTC (permalink / raw)
To: Sailesh Nandanavanam
Cc: SeongJae Park, shuah, damon, linux-mm, linux-kselftest,
linux-kernel
On Fri, 12 Jun 2026 02:06:51 +0530 Sailesh Nandanavanam <saileshnandanavanam@gmail.com> wrote:
> Hello SeongJae,
>
> Noted on inline replies - I'll follow that format from the next response.
>
> Following up on this thread with progress.
>
> As discussed, I dropped the userspace selftest approach since it
> cannot reliably trigger the microsecond-wide race window.
>
> I wrote a KUnit test instead, added to mm/damon/tests/core-kunit.h:
>
> - Creates a damon_ctx
> - Sets ctx->walk_control_obsolete = true under walk_control_lock,
> simulating the state just before kdamond_fn() exits
I don't think you don't need to hold the lock, since this is just a unit test
and there should be no race for the test.
> - Calls damos_walk() and verifies it returns -ECANCELED immediately
>
> I built and ran this on a kernel with commit 33c3f6c2b48c
> ("mm/damon/core: fix damos_walk() vs kdamond_fn() exit race") present,
> and the test passes along with all other 28 existing damon KUnit
> tests:
>
> ok 29 damon_test_walk_control_obsolete
> # damon: pass:29 fail:0 skip:0 total:29
>
> Since walk_control_obsolete is a field introduced by this fix, a
> kernel with the fix reverted would not compile this test (the field
> would not exist). So the usual "fails before fix, passes after fix"
> verification does not apply the same way here - the test directly
> validates the new field/behavior introduced by the fix.
>
> If this approach looks reasonable, I plan to send v3 with this KUnit
> test (replacing the userspace selftest from v1/v2), along with the
> commit message format and other minor fixes from your earlier review.
Other than the trivial comment about locking, your plan soudns reasonable to
me. Looking forward to your next patch!
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3] mm/damon: add KUnit test for walk_control_obsolete behavior
2026-05-24 10:02 [PATCH v2] selftests/damon: add regression test for damos_walk() vs kdamond exit race Sailesh Nandanavanam
2026-05-24 18:37 ` SeongJae Park
@ 2026-06-12 6:23 ` Sailesh Nandanavanam
2026-06-12 15:05 ` SeongJae Park
1 sibling, 1 reply; 10+ messages in thread
From: Sailesh Nandanavanam @ 2026-06-12 6:23 UTC (permalink / raw)
To: sj
Cc: shuah, damon, linux-mm, linux-kselftest, linux-kernel,
Sailesh Nandanavanam
Add a KUnit test to verify that damos_walk() rejects
new requests when walk_control_obsolete is set.
Commit 33c3f6c2b48c ("mm/damon/core: fix damos_walk() vs
kdamond_fn() exit race") introduced walk_control_obsolete
to prevent a race condition where new requests could be
registered during kdamond shutdown and never handled.
This test simulates the shutdown condition by setting
walk_control_obsolete and verifies that damos_walk()
returns -ECANCELED immediately.
This validates the invariant introduced by the fix and
helps prevent regressions.
Suggested-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Sailesh Nandanavanam <saileshnandanavanam@gmail.com>
---
Changes since v2 (https://lore.kernel.org/20260524100258.36819-1-saileshnandanavanam@gmail.com):
- Dropped the userspace selftest approach entirely. SeongJae tested
v2 100 times on a kernel with the fix reverted and it always
passed, confirming the microsecond-wide race window cannot be
reliably hit from userspace due to syscall overhead.
- Added a KUnit test for damos_walk() + walk_control_obsolete
instead, as suggested by SeongJae. This directly sets the
obsolete flag and verifies damos_walk() returns -ECANCELED
immediately, without timing dependency.
Changes since v1 (https://lore.kernel.org/20260524091812.35283-1-saileshnandanavanam@gmail.com):
- Addressed sashiko bot review comments (execute bit, threading,
dynamic sysfs path, OSError handling) - superseded by the v3
approach above.
mm/damon/tests/core-kunit.h | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/mm/damon/tests/core-kunit.h b/mm/damon/tests/core-kunit.h
index 9e5904c2beeb..599aac056b73 100644
--- a/mm/damon/tests/core-kunit.h
+++ b/mm/damon/tests/core-kunit.h
@@ -1336,6 +1336,33 @@ static void damon_test_is_last_region(struct kunit *test)
damon_free_target(t);
}
+/*
+ * Verify that damos_walk() rejects new requests when
+ * walk_control_obsolete is set.
+ *
+ * This tests the invariant introduced by:
+ * commit 33c3f6c2b48c ("mm/damon/core: fix damos_walk() vs kdamond_fn() exit race")
+ */
+static void damon_test_walk_control_obsolete(struct kunit *test)
+{
+ struct damon_ctx *ctx;
+ struct damos_walk_control control = {};
+ int ret;
+
+ ctx = damon_new_ctx();
+ if (!ctx)
+ kunit_skip(test, "ctx alloc fail");
+
+ /* Simulate shutdown phase */
+ ctx->walk_control_obsolete = true;
+
+ ret = damos_walk(ctx, &control);
+
+ KUNIT_EXPECT_EQ(test, ret, -ECANCELED);
+
+ damon_destroy_ctx(ctx);
+}
+
static struct kunit_case damon_test_cases[] = {
KUNIT_CASE(damon_test_target),
KUNIT_CASE(damon_test_regions),
@@ -1365,6 +1392,7 @@ static struct kunit_case damon_test_cases[] = {
KUNIT_CASE(damon_test_set_filters_default_reject),
KUNIT_CASE(damon_test_apply_min_nr_regions),
KUNIT_CASE(damon_test_is_last_region),
+ KUNIT_CASE(damon_test_walk_control_obsolete),
{},
};
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3] mm/damon: add KUnit test for walk_control_obsolete behavior
2026-06-12 6:23 ` [PATCH v3] mm/damon: add KUnit test for walk_control_obsolete behavior Sailesh Nandanavanam
@ 2026-06-12 15:05 ` SeongJae Park
0 siblings, 0 replies; 10+ messages in thread
From: SeongJae Park @ 2026-06-12 15:05 UTC (permalink / raw)
To: Sailesh Nandanavanam
Cc: SeongJae Park, shuah, damon, linux-mm, linux-kselftest,
linux-kernel
Hello Sailesh,
Thank you for this patch!
From next time, please send a new version of a patch as a new thread, not as a
reply to the previous version.
For the subject, please use mm/damon/tests/core-kunit: as the prefix.
On Fri, 12 Jun 2026 11:53:37 +0530 Sailesh Nandanavanam <saileshnandanavanam@gmail.com> wrote:
> Add a KUnit test to verify that damos_walk() rejects
> new requests when walk_control_obsolete is set.
>
> Commit 33c3f6c2b48c ("mm/damon/core: fix damos_walk() vs
> kdamond_fn() exit race") introduced walk_control_obsolete
> to prevent a race condition where new requests could be
> registered during kdamond shutdown and never handled.
>
> This test simulates the shutdown condition by setting
> walk_control_obsolete and verifies that damos_walk()
> returns -ECANCELED immediately.
>
> This validates the invariant introduced by the fix and
> helps prevent regressions.
>
> Suggested-by: SeongJae Park <sj@kernel.org>
> Signed-off-by: Sailesh Nandanavanam <saileshnandanavanam@gmail.com>
Other than the subject prefix, all looks good.
Reviewed-by: SeongJae Park <sj@kernel.org>
I applied this patch to damon/next [1] tree after modifying the subject prefix
as I suggested, assuming you don't mind. If you do mind that change, please
let me know.
We are now quite close to next merge window. We (mm community) want
to focus on making mm.git more stabilized and therefore ready for the next
merge window, rather than adding more changes that are not really urgent. I
understand this patch is not really urgent. Hence, Andrew might not add this
patch until next -rc1 release. In the case, I will request adding this to
mm.git after next -rc1 release. So, no action from your side is needed for
now.
[1] https://origin.kernel.org/doc/html/latest/mm/damon/maintainer-profile.html#scm-trees
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-06-12 15:05 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-24 10:02 [PATCH v2] selftests/damon: add regression test for damos_walk() vs kdamond exit race Sailesh Nandanavanam
2026-05-24 18:37 ` SeongJae Park
2026-06-05 7:14 ` Sailesh Nandanavanam
2026-06-05 9:55 ` Sailesh Nandanavanam
2026-06-06 0:36 ` SeongJae Park
2026-06-11 20:20 ` Sailesh Nandanavanam
2026-06-11 20:36 ` Sailesh Nandanavanam
2026-06-12 0:11 ` SeongJae Park
2026-06-12 6:23 ` [PATCH v3] mm/damon: add KUnit test for walk_control_obsolete behavior Sailesh Nandanavanam
2026-06-12 15:05 ` SeongJae Park
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.