From: Schspa Shi <schspa@gmail.com>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, rostedt@goodmis.org,
bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
vschneid@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v6 1/2] sched/rt: fix bad task migration for rt tasks
Date: Tue, 12 Jul 2022 23:35:00 +0800 [thread overview]
Message-ID: <m2h73me3nk.fsf@gmail.com> (raw)
In-Reply-To: <c1db7f31-82e1-eac4-bd49-212859727cb2@arm.com>
Dietmar Eggemann <dietmar.eggemann@arm.com> writes:
> On 12/07/2022 17:05, Schspa Shi wrote:
>> Commit 95158a89dd50 ("sched,rt: Use the full cpumask for balancing")
>> allow find_lock_lowest_rq to pick a task with migration disabled.
>> This commit is intended to push the current running task on this CPU
>> away.
>>
>> There is a race scenarios, which allows a migration disabled task to
>> be migrated to another CPU.
>>
>> When there is a RT task with higher priority, rt sched class was
>> intended to migrate higher priority task to lowest rq via push_rt_tasks,
>> this BUG will happen here.
>>
>> With the system running on PREEMPT_RT, rt_spin_lock will disable
>> migration, this will make the problem easier to reproduce.
>>
>> I have seen this crash on PREEMPT_RT, from the logs, there is a race
>> when trying to migrate higher priority tasks to the lowest rq.
>>
>> Please refer to the following scenarios.
>>
>> CPU0 CPU1
>> ------------------------------------------------------------------
>> push_rt_task
>> check is_migration_disabled(next_task)
>> task not running and
>> migration_disabled == 0
>> find_lock_lowest_rq(next_task, rq);
>> _double_lock_balance(this_rq, busiest);
>> raw_spin_rq_unlock(this_rq);
>> double_rq_lock(this_rq, busiest);
>> <<wait for busiest rq>>
>> <wakeup>
>> task become running
>> migrate_disable();
>> <context out>
>> deactivate_task(rq, next_task, 0);
>> set_task_cpu(next_task, lowest_rq->cpu);
>> WARN_ON_ONCE(is_migration_disabled(p));
>> ---------OOPS-------------
>>
>> Crash logs as fellowing:
>> [123671.996430] WARNING: CPU: 2 PID: 13470 at kernel/sched/core.c:2485
>
> What code-base is this?
This is the logs from 5.10.59-rt
Link: https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git
v5.10.59-rt52 (9007b684f615750b0ee4ec57b5e547a4bf4a223e).
>
> IMHO, currently this `WARN_ON_ONCE(is_migration_disabled(p))` in
> set_task_cpu() is at > line 3000.
>
But the master code have this BUG too.
>> set_task_cpu+0x8c/0x108
>> [123671.996800] pstate: 20400009 (nzCv daif +PAN -UAO -TCO BTYPE=--)
>> [123671.996811] pc : set_task_cpu+0x8c/0x108
>> [123671.996820] lr : set_task_cpu+0x7c/0x108
>> [123671.996828] sp : ffff80001268bd30
>> [123671.996832] pmr_save: 00000060
>> [123671.996835] x29: ffff80001268bd30 x28: ffff0001a3d68e80
>> [123671.996844] x27: ffff80001225f4a8 x26: ffff800010ab62cb
>> [123671.996854] x25: ffff80026d95e000 x24: 0000000000000005
>> [123671.996864] x23: ffff00019746c1b0 x22: 0000000000000000
>> [123671.996873] x21: ffff00027ee33a80 x20: 0000000000000000
>> [123671.996882] x19: ffff00019746ba00 x18: 0000000000000000
>> [123671.996890] x17: 0000000000000000 x16: 0000000000000000
>> [123671.996899] x15: 000000000000000a x14: 000000000000349e
>> [123671.996908] x13: ffff800012f4503d x12: 0000000000000001
>> [123671.996916] x11: 0000000000000000 x10: 0000000000000000
>> [123671.996925] x9 : 00000000000c0000 x8 : ffff00027ee58700
>> [123671.996933] x7 : ffff00027ee8da80 x6 : ffff00027ee8e580
>> [123671.996942] x5 : ffff00027ee8dcc0 x4 : 0000000000000005
>> [123671.996951] x3 : ffff00027ee8e338 x2 : 0000000000000000
>> [123671.996959] x1 : 00000000000000ff x0 : 0000000000000002
>> [123671.996969] Call trace:
>> [123671.996975] set_task_cpu+0x8c/0x108
>> [123671.996984] push_rt_task.part.0+0x144/0x184
>> [123671.996995] push_rt_tasks+0x28/0x3c
>> [123671.997002] task_woken_rt+0x58/0x68
>> [123671.997009] ttwu_do_wakeup+0x5c/0xd0
>> [123671.997019] ttwu_do_activate+0xc0/0xd4
>> [123671.997028] try_to_wake_up+0x244/0x288
>> [123671.997036] wake_up_process+0x18/0x24
>> [123671.997045] __irq_wake_thread+0x64/0x80
>> [123671.997056] __handle_irq_event_percpu+0x110/0x124
>> [123671.997064] handle_irq_event_percpu+0x50/0xac
>> [123671.997072] handle_irq_event+0x84/0xfc
>>
>
> [...]
>
>> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
>> index 8c9ed96648409..7bd3e6ecbe45e 100644
>> --- a/kernel/sched/rt.c
>> +++ b/kernel/sched/rt.c
>> @@ -1998,11 +1998,15 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
>> * the mean time, task could have
>> * migrated already or had its affinity changed.
>> * Also make sure that it wasn't scheduled on its rq.
>> + * It is possible the task was scheduled, set
>> + * "migrate_disabled" and then got preempted, so we must
>> + * check the task migration disable flag here too.
>> */
>> if (unlikely(task_rq(task) != rq ||
>> !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
>> task_running(rq, task) ||
>> !rt_task(task) ||
>> + is_migration_disabled(task) ||
>
> I wonder why this isn't covered by `task_rq(task) != rq` in this condition?
>
It's because thie task is not migrated, it just get scheduled and
calling migrate_disable(); and then got preempted by it's CPU core
before enable migrate_enable(). the task_rq not changed in this
scenarios.
> [...]
--
BRs
Schspa Shi
next prev parent reply other threads:[~2022-07-12 15:45 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-12 15:05 [PATCH v6 1/2] sched/rt: fix bad task migration for rt tasks Schspa Shi
2022-07-12 15:05 ` [PATCH v6 2/2] sched/rt: Trying to push current task when target disable migrating Schspa Shi
2022-07-12 15:17 ` Steven Rostedt
2022-07-13 10:02 ` Dietmar Eggemann
2022-07-13 12:24 ` Schspa Shi
2022-07-12 15:16 ` [PATCH v6 1/2] sched/rt: fix bad task migration for rt tasks Steven Rostedt
2022-07-12 15:26 ` Dietmar Eggemann
2022-07-12 15:35 ` Schspa Shi [this message]
2022-07-13 9:43 ` Dietmar Eggemann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m2h73me3nk.fsf@gmail.com \
--to=schspa@gmail.com \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.