All of lore.kernel.org
 help / color / mirror / Atom feed
From: SeongJae Park <sj@kernel.org>
To: Yunjeong Mun <yunjeong.mun@sk.com>
Cc: SeongJae Park <sj@kernel.org>,
	damon@lists.linux.dev, honggyu.kim@sk.com,
	kernel_team@skhynix.com
Subject: Re: [BUG] 'damo stop' causes kernel crash in v6.17-rc3
Date: Wed,  3 Sep 2025 21:02:03 -0700	[thread overview]
Message-ID: <20250904040203.1306-1-sj@kernel.org> (raw)
In-Reply-To: <20250904011738.930-1-yunjeong.mun@sk.com>

Hi Yunjeong,

On Thu,  4 Sep 2025 10:17:38 +0900 Yunjeong Mun <yunjeong.mun@sk.com> wrote:

> Hi!
> 
> I encountered a kernel crash when running 'damo stop' in kernel v6.17-rc3, 
> I tested and confirmed that this issue also occurs in v6.17-rc1.

Thank you for finding and sharing this issue!

> 
> 'damo' version that I tested is v2.9.3 and v2.4.7.
> 
> The crash happens when DAMON is configured to used both 'migrate_hot' 
> and migrate_cold' actions. I tested that if DAMON is started with only 
> one of the two actions, it works fine.

I understand you mean the problem is reproducible when you use two kdamond
threads, and you confirmed it doesn't happen when you use single kdamond
thread.  Please let me know if I'm misunderstanding.

> Below is the command I used:
> 
> ```shell
> $ ./damo start \
>  --ops paddr --numa_node 0 --monitoring_intervals 100ms 2s 20s --damos_action migrate_cold 1 \
>  --ops paddr --numa_node 1 --monitoring_intervals 100ms 2s 20s --damos_action migrate_hot 0 \
>  --nr_targets 1 1 --nr_schemes 1 1 --nr_ctxs 1 1
> 
> $ ps aux | grep kdamond
> root      1193 98.2  0.0      0     0 ?        R    07:58   0:18 [kdamond.0]
> root      1194 11.2  0.0      0     0 ?        R    07:58   0:02 [kdamond.1]
> 
> # Error occurs
> $ ./damo stop 
> ```

Thank you for sharingthis detailed steps.  On my setup, this doesn't cause
crash but make 'damo' hang.

I found commit d809a7c64ba8 ("mm/damon/sysfs: implement refresh_ms file
internal work") is the first bad commit, according to 'git bisect'.  And
actually the code is broken for multiple kdamonds case, since it is sharing one
damon_call_control object for multiple kdamonds while overwriting the data
field to later-called one.

I haven't yet deep dive into by what code path the issue happens, but sharing
this first, since I have to go out soon.  I'll further take a look later.
Meanwhile, could you please also confirm if it is the first bad commit for your
issue, too?

> 
> This issue also occurs when starting DAMON using yaml configuration file 
> that includes both the 'migrate_hot' and 'migrate_cold' actions.
> Below is the dmesg log at the time of the issue:
> 
> ```
> [157729.130361] Call Trace:                                                                                                                                                                                                                                                                                                                                                       [19/1810]
> [157729.130540]  <TASK>
> [157729.130718]  kthread_stop+0x158/0x190
> [157729.130904]  kthread_stop_put+0x18/0x80
> [157729.131084]  damon_stop+0x4c/0xd0
> [157729.131264]  state_store+0x190/0x380
> [157729.131445]  ? __x64_sys_ioctl+0x7e/0xf0
> [157729.131625]  kobj_attr_store+0x13/0x30
> [157729.131805]  sysfs_kf_write+0x73/0x90
> [157729.131986]  kernfs_fop_write_iter+0x13a/0x1c0
> [157729.132167]  vfs_write+0x304/0x420
> [157729.132349]  ksys_write+0x6d/0xe0
> [157729.132525]  __x64_sys_write+0x1d/0x30
> [157729.132696]  x64_sys_call+0x16ec/0x2180
> [157729.132865]  do_syscall_64+0x74/0x1d0
> [157729.133030]  ? __x64_sys_ioctl+0x7e/0xf0
> [157729.133195]  ? x64_sys_call+0x1268/0x2180
> [157729.133364]  ? do_syscall_64+0xa3/0x1d0
> [157729.133534]  ? do_syscall_64+0xa3/0x1d0
> [157729.133698]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

'scripts/decode_stacktrace.sh' can show which line of what source file each of
the above line points.  So if you could share the output of the script from
your next bug reports, it would be pretty helpful.

So, I'll take further look, but please let me know if the first broken commit I
found is also the first broken commit for your issue.


Thanks,
SJ

[...]

  reply	other threads:[~2025-09-04  4:02 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-04  1:17 [BUG] 'damo stop' causes kernel crash in v6.17-rc3 Yunjeong Mun
2025-09-04  4:02 ` SeongJae Park [this message]
2025-09-04  8:29   ` Yunjeong Mun
2025-09-05  3:54     ` SeongJae Park
2025-09-05  9:08       ` Yunjeong Mun
2025-09-05 20:07         ` SeongJae Park
2025-09-08  4:36           ` Yunjeong Mun
2025-09-08 20:18             ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250904040203.1306-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=damon@lists.linux.dev \
    --cc=honggyu.kim@sk.com \
    --cc=kernel_team@skhynix.com \
    --cc=yunjeong.mun@sk.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.