* [PATCH v2] sched: fix migration to invalid cpu in __set_cpus_allowed_ptr
@ 2019-09-16 6:53 KeMeng Shi
2019-09-17 5:11 ` Pavan Kondeti
2019-09-27 8:10 ` [tip: sched/urgent] sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr() tip-bot2 for KeMeng Shi
0 siblings, 2 replies; 3+ messages in thread
From: KeMeng Shi @ 2019-09-16 6:53 UTC (permalink / raw)
To: mingo, peterz, valentin.schneider; +Cc: linux-kernel
Oops occur when running qemu on arm64:
Unable to handle kernel paging request at virtual address ffff000008effe40
Internal error: Oops: 96000007 [#1] SMP
Process migration/0 (pid: 12, stack limit = 0x00000000084e3736)
pstate: 20000085 (nzCv daIf -PAN -UAO)
pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20
lr : move_queued_task.isra.21+0x124/0x298
...
Call trace:
__ll_sc___cmpxchg_case_acq_4+0x4/0x20
__migrate_task+0xc8/0xe0
migration_cpu_stop+0x170/0x180
cpu_stopper_thread+0xec/0x178
smpboot_thread_fn+0x1ac/0x1e8
kthread+0x134/0x138
ret_from_fork+0x10/0x18
__set_cpus_allowed_ptr will choose an active dest_cpu in affinity mask to
migrage the process if process is not currently running on any one of the
CPUs specified in affinity mask. __set_cpus_allowed_ptr will choose an
invalid dest_cpu (dest_cpu >= nr_cpu_ids, 1024 in my virtual machine) if
CPUS in an affinity mask are deactived by cpu_down after cpumask_intersects
check. cpumask_test_cpu of dest_cpu afterwards is overflow and may pass if
corresponding bit is coincidentally set. As a consequence, kernel will
access an invalid rq address associate with the invalid cpu in
migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs.
Process as follows may trigger the Oops:
1) A process repeatedly binds itself to cpu0 and cpu1 in turn by calling
sched_setaffinity.
2) A shell script repeatedly "echo 0 > /sys/devices/system/cpu/cpu1/online"
and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn.
3) Oops appears if the invalid cpu is set in memory after tested cpumask.
Signed-off-by: KeMeng Shi <shikemeng@huawei.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
---
Changes in v2:
-solve format problems in log
kernel/sched/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3c7b90bcbe4e..087f4ac30b60 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1656,7 +1656,8 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
if (cpumask_equal(p->cpus_ptr, new_mask))
goto out;
- if (!cpumask_intersects(new_mask, cpu_valid_mask)) {
+ dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
+ if (dest_cpu >= nr_cpu_ids) {
ret = -EINVAL;
goto out;
}
@@ -1677,7 +1678,6 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
if (cpumask_test_cpu(task_cpu(p), new_mask))
goto out;
- dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
if (task_running(rq, p) || p->state == TASK_WAKING) {
struct migration_arg arg = { p, dest_cpu };
/* Need help from migration thread: drop lock and wait. */
--
2.19.1
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH v2] sched: fix migration to invalid cpu in __set_cpus_allowed_ptr
2019-09-16 6:53 [PATCH v2] sched: fix migration to invalid cpu in __set_cpus_allowed_ptr KeMeng Shi
@ 2019-09-17 5:11 ` Pavan Kondeti
2019-09-27 8:10 ` [tip: sched/urgent] sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr() tip-bot2 for KeMeng Shi
1 sibling, 0 replies; 3+ messages in thread
From: Pavan Kondeti @ 2019-09-17 5:11 UTC (permalink / raw)
To: KeMeng Shi; +Cc: mingo, peterz, valentin.schneider, linux-kernel
On Mon, Sep 16, 2019 at 06:53:28AM +0000, KeMeng Shi wrote:
> Oops occur when running qemu on arm64:
> Unable to handle kernel paging request at virtual address ffff000008effe40
> Internal error: Oops: 96000007 [#1] SMP
> Process migration/0 (pid: 12, stack limit = 0x00000000084e3736)
> pstate: 20000085 (nzCv daIf -PAN -UAO)
> pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20
> lr : move_queued_task.isra.21+0x124/0x298
> ...
> Call trace:
> __ll_sc___cmpxchg_case_acq_4+0x4/0x20
> __migrate_task+0xc8/0xe0
> migration_cpu_stop+0x170/0x180
> cpu_stopper_thread+0xec/0x178
> smpboot_thread_fn+0x1ac/0x1e8
> kthread+0x134/0x138
> ret_from_fork+0x10/0x18
>
> __set_cpus_allowed_ptr will choose an active dest_cpu in affinity mask to
> migrage the process if process is not currently running on any one of the
> CPUs specified in affinity mask. __set_cpus_allowed_ptr will choose an
> invalid dest_cpu (dest_cpu >= nr_cpu_ids, 1024 in my virtual machine) if
> CPUS in an affinity mask are deactived by cpu_down after cpumask_intersects
> check. cpumask_test_cpu of dest_cpu afterwards is overflow and may pass if
> corresponding bit is coincidentally set. As a consequence, kernel will
> access an invalid rq address associate with the invalid cpu in
> migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs.
>
> Process as follows may trigger the Oops:
> 1) A process repeatedly binds itself to cpu0 and cpu1 in turn by calling
> sched_setaffinity.
> 2) A shell script repeatedly "echo 0 > /sys/devices/system/cpu/cpu1/online"
> and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn.
> 3) Oops appears if the invalid cpu is set in memory after tested cpumask.
>
> Signed-off-by: KeMeng Shi <shikemeng@huawei.com>
> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
> Changes in v2:
> -solve format problems in log
>
> kernel/sched/core.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3c7b90bcbe4e..087f4ac30b60 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1656,7 +1656,8 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
> if (cpumask_equal(p->cpus_ptr, new_mask))
> goto out;
>
> - if (!cpumask_intersects(new_mask, cpu_valid_mask)) {
> + dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
> + if (dest_cpu >= nr_cpu_ids) {
> ret = -EINVAL;
> goto out;
> }
> @@ -1677,7 +1678,6 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
> if (cpumask_test_cpu(task_cpu(p), new_mask))
> goto out;
>
> - dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
> if (task_running(rq, p) || p->state == TASK_WAKING) {
> struct migration_arg arg = { p, dest_cpu };
> /* Need help from migration thread: drop lock and wait. */
> --
> 2.19.1
>
>
The cpu_active_mask might have changed in between. Your fix looks good to me.
Thanks,
Pavan
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply [flat|nested] 3+ messages in thread* [tip: sched/urgent] sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr()
2019-09-16 6:53 [PATCH v2] sched: fix migration to invalid cpu in __set_cpus_allowed_ptr KeMeng Shi
2019-09-17 5:11 ` Pavan Kondeti
@ 2019-09-27 8:10 ` tip-bot2 for KeMeng Shi
1 sibling, 0 replies; 3+ messages in thread
From: tip-bot2 for KeMeng Shi @ 2019-09-27 8:10 UTC (permalink / raw)
To: linux-tip-commits
Cc: KeMeng Shi, Peter Zijlstra (Intel), Valentin Schneider,
Linus Torvalds, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
linux-kernel
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 714e501e16cd473538b609b3e351b2cc9f7f09ed
Gitweb: https://git.kernel.org/tip/714e501e16cd473538b609b3e351b2cc9f7f09ed
Author: KeMeng Shi <shikemeng@huawei.com>
AuthorDate: Mon, 16 Sep 2019 06:53:28
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 25 Sep 2019 17:42:31 +02:00
sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr()
An oops can be triggered in the scheduler when running qemu on arm64:
Unable to handle kernel paging request at virtual address ffff000008effe40
Internal error: Oops: 96000007 [#1] SMP
Process migration/0 (pid: 12, stack limit = 0x00000000084e3736)
pstate: 20000085 (nzCv daIf -PAN -UAO)
pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20
lr : move_queued_task.isra.21+0x124/0x298
...
Call trace:
__ll_sc___cmpxchg_case_acq_4+0x4/0x20
__migrate_task+0xc8/0xe0
migration_cpu_stop+0x170/0x180
cpu_stopper_thread+0xec/0x178
smpboot_thread_fn+0x1ac/0x1e8
kthread+0x134/0x138
ret_from_fork+0x10/0x18
__set_cpus_allowed_ptr() will choose an active dest_cpu in affinity mask to
migrage the process if process is not currently running on any one of the
CPUs specified in affinity mask. __set_cpus_allowed_ptr() will choose an
invalid dest_cpu (dest_cpu >= nr_cpu_ids, 1024 in my virtual machine) if
CPUS in an affinity mask are deactived by cpu_down after cpumask_intersects
check. cpumask_test_cpu() of dest_cpu afterwards is overflown and may pass if
corresponding bit is coincidentally set. As a consequence, kernel will
access an invalid rq address associate with the invalid CPU in
migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs.
The reproduce the crash:
1) A process repeatedly binds itself to cpu0 and cpu1 in turn by calling
sched_setaffinity.
2) A shell script repeatedly does "echo 0 > /sys/devices/system/cpu/cpu1/online"
and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn.
3) Oops appears if the invalid CPU is set in memory after tested cpumask.
Signed-off-by: KeMeng Shi <shikemeng@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1568616808-16808-1-git-send-email-shikemeng@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d9a394..83ea23e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1656,7 +1656,8 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
if (cpumask_equal(p->cpus_ptr, new_mask))
goto out;
- if (!cpumask_intersects(new_mask, cpu_valid_mask)) {
+ dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
+ if (dest_cpu >= nr_cpu_ids) {
ret = -EINVAL;
goto out;
}
@@ -1677,7 +1678,6 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
if (cpumask_test_cpu(task_cpu(p), new_mask))
goto out;
- dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
if (task_running(rq, p) || p->state == TASK_WAKING) {
struct migration_arg arg = { p, dest_cpu };
/* Need help from migration thread: drop lock and wait. */
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2019-09-27 8:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-09-16 6:53 [PATCH v2] sched: fix migration to invalid cpu in __set_cpus_allowed_ptr KeMeng Shi
2019-09-17 5:11 ` Pavan Kondeti
2019-09-27 8:10 ` [tip: sched/urgent] sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr() tip-bot2 for KeMeng Shi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.