Re: [Question] report a race condition between CPU hotplug state machine and hrtimer 'sched_cfs_period_timer' for cfs bandwidth throttling

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Thomas Gleixner <tglx@linutronix.de>
To: Xiongfeng Wang <wangxiongfeng2@huawei.com>,
	vschneid@redhat.com, Phil Auld <pauld@redhat.com>,
	vdonnefort@google.com
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	wangxiongfeng2@huawei.com, Wei Li <liwei391@huawei.com>,
	"liaoyu (E)" <liaoyu15@huawei.com>,
	zhangqiao22@huawei.com, Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [Question] report a race condition between CPU hotplug state machine and hrtimer 'sched_cfs_period_timer' for cfs bandwidth throttling
Date: Fri, 09 Jun 2023 16:55:37 +0200	[thread overview]
Message-ID: <87mt18it1y.ffs@tglx> (raw)
In-Reply-To: <8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com>

On Fri, Jun 09 2023 at 19:24, Xiongfeng Wang wrote:

Cc+ scheduler people, leave context intact

> Hello,
>  When I do some low power tests, the following hung task is printed.
>
>   Call trace:
>    __switch_to+0xd4/0x160
>    __schedule+0x38c/0x8c4
>    __cond_resched+0x24/0x50
>    unmap_kernel_range_noflush+0x210/0x240
>    kretprobe_trampoline+0x0/0xc8
>    __vunmap+0x70/0x31c
>    __vfree+0x34/0x8c
>    vfree+0x40/0x58
>    free_vm_stack_cache+0x44/0x74
>    cpuhp_invoke_callback+0xc4/0x71c
>    _cpu_down+0x108/0x284
>    kretprobe_trampoline+0x0/0xc8
>    suspend_enter+0xd8/0x8ec
>    suspend_devices_and_enter+0x1f0/0x360
>    pm_suspend.part.1+0x428/0x53c
>    pm_suspend+0x3c/0xa0
>    devdrv_suspend_proc+0x148/0x248 [drv_devmng]
>    devdrv_manager_set_power_state+0x140/0x680 [drv_devmng]
>    devdrv_manager_ioctl+0xcc/0x210 [drv_devmng]
>    drv_ascend_intf_ioctl+0x84/0x248 [drv_davinci_intf]
>    __arm64_sys_ioctl+0xb4/0xf0
>    el0_svc_common.constprop.0+0x140/0x374
>    do_el0_svc+0x80/0xa0
>    el0_svc+0x1c/0x28
>    el0_sync_handler+0x90/0xf0
>    el0_sync+0x168/0x180
>
> After some analysis, I found it is caused by the following race condition.
>
> 1. A task running on CPU1 is throttled for cfs bandwidth. CPU1 starts the
> hrtimer cfs_bandwidth 'period_timer' and enqueue the hrtimer on CPU1's rbtree.
> 2. Then the task is migrated to CPU2 and starts to offline CPU1. CPU1 starts
> CPUHP AP steps, and then the hrtimer 'period_timer' expires and re-enqueued on CPU1.
> 3. CPU1 runs to take_cpu_down() and disable irq. After CPU1 finished CPUHP AP
> steps, CPU2 starts the rest CPUHP step.
> 4. When CPU2 runs to free_vm_stack_cache(), it is sched out in __vunmap()
> because it run out of CPU quota. start_cfs_bandwidth() does not restart the
> hrtimer because 'cfs_b->period_active' is set.
> 5. The task waits the hrtimer 'period_timer' to expire to wake itself up, but
> CPU1 has disabled irq and the hrtimer won't expire until it is migrated to CPU2
> in hrtimers_dead_cpu(). But the task is blocked and cannot proceed to
> hrtimers_dead_cpu() step. So the task hungs.
>
>     CPU1      			                 	 CPU2
> Task set cfs_quota
> start hrtimer cfs_bandwidth 'period_timer'
> 						start to offline CPU1
> CPU1 start CPUHP AP step
> ...
> 'period_timer' expired and re-enqueued on CPU1
> ...
> disable irq in take_cpu_down()
> ...
> 						CPU2 start the rest CPUHP steps
> 						...
> 					      sched out in free_vm_stack_cache()
> 						wait for 'period_timer' expires
>
>
> Appreciate it a lot if anyone can give some suggestion on how fix this problem !
>
> Thanks,
> Xiongfeng

next prev parent reply	other threads:[~2023-06-09 14:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-09 11:24 [Question] report a race condition between CPU hotplug state machine and hrtimer 'sched_cfs_period_timer' for cfs bandwidth throttling Xiongfeng Wang
2023-06-09 14:55 ` Thomas Gleixner [this message]
2023-06-12 12:49   ` Xiongfeng Wang
2023-06-26  8:23     ` Xiongfeng Wang
2023-06-27 16:46       ` Vincent Guittot
2023-06-28 12:03         ` Thomas Gleixner
2023-06-28 12:35           ` Vincent Guittot
2023-06-28 22:01             ` Thomas Gleixner
2023-06-29  1:41               ` Xiongfeng Wang
2023-06-29  8:30               ` Vincent Guittot
2023-08-22  8:58                 ` Xiongfeng Wang
2023-08-23 10:14                 ` Thomas Gleixner
2023-08-24  7:25                   ` Yu Liao
2023-08-29  7:18                   ` Vincent Guittot
2023-06-28 13:30         ` Vincent Guittot
2023-06-28 21:09           ` Thomas Gleixner
2023-06-29  1:26         ` Xiongfeng Wang
2023-06-29  8:33           ` Vincent Guittot
2023-08-30 10:29 ` [tip: smp/urgent] cpu/hotplug: Prevent self deadlock on CPU hot-unplug tip-bot2 for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mt18it1y.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=dietmar.eggemann@arm.com \
    --cc=liaoyu15@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liwei391@huawei.com \
    --cc=mingo@kernel.org \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=vdonnefort@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=wangxiongfeng2@huawei.com \
    --cc=zhangqiao22@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.