All of lore.kernel.org
 help / color / mirror / Atom feed
* [Question] The system may be stuck if there is a cpu cgroup cpu.cfs_quato_us is very low
@ 2022-06-27  6:50 ` Zhang Qiao
  0 siblings, 0 replies; 12+ messages in thread
From: Zhang Qiao @ 2022-06-27  6:50 UTC (permalink / raw)
  To: Tejun Heo, mingo-H+wXaHxf7aLQT0dZR+AlfA,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, Juri Lelli, Vincent Guittot
  Cc: lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w,
	cgroups-u79uwXL29TY76Z2rM5mHXA, lkml,
	vschneid-H+wXaHxf7aLQT0dZR+AlfA, dietmar.eggemann-5wv7dgnIgG8,
	bristot-H+wXaHxf7aLQT0dZR+AlfA, bsegall-hpIqsD4AKlfQT0dZR+AlfA,
	Steven Rostedt, mgorman-l3A5Bk7waGM

Hi all,

I'm working on debuging a problem.
The testcase does follew operations:
1) create a test task cgroup, set cpu.cfs_quota_us=2000,cpu.cfs_period_us=100000.
2) run 20 test_fork[1] test process in the test task cgroup.
3) create 100 new containers:
   for i in {1..100}; do docker run -itd  --health-cmd="ls" --health-interval=1s ubuntu:latest  bash; done

These operations are expected to succeed and 100 containers create success. however, when creating containers,
the system will get stuck and create container failed.

After debug this, I found the test_fork process frequently sleep in freezer_fork()->mutex_lock()->might_sleep()
with taking the cgroup_threadgroup_rw_sem lock, as follow:

copy_process():
	cgroup_can_fork()			---> lock cgroup_threadgroup_rw_sem
	sched_cgroup_fork();
	  ->task_fork_fair(){
	      ->update_curr(){
		  ->__account_cfs_rq_runtime() {
			resched_curr();		---> the quota is used up, and set flag TIF_NEED_RESCHED to current
		   }
	cgroup_post_fork();   		
	  ->feezer_fork()
	      ->mutex_lock() {	
		  ->might_sleep()  		---> schedule() and the current task will be throttled long time.

	  ->cgroup_css_set_put_fork()    	---> unlock cgroup_threadgroup_rw_sem


Becuase the task cgroup's cpu.cfs_quota_us is very small and test_fork's load is very heavy, the test_fork
may be throttled long time, therefore, the cgroup_threadgroup_rw_sem read lock is held for a long time, other
processes will get stuck waiting for the lock:

1) a task fork child, will wait at copy_process()->cgroup_can_fork();

2) a task exiting will wait at exit_signals();

3) a task write cgroup.procs file will wait at cgroup_file_write()->__cgroup1_procs_write();
...

even the whole system will get stuck.

Anyone know how to slove this? Except for changing the cpu.cfs_quota_us.


[1] test_fork.c

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>

int main(int argc, char **argv)
{
    pid_t pid;
    int count = 20;

    while(1) {
        for (int i = 0; i < count; i++) {
            if ((pid = fork()) <0) {
                printf("fork error");
                return 1;
            } else if (pid ==0) {
                exit(0);
            }
        }

        for (int i = 0; i < count; i++) {
            wait(NULL);
        }
	sleep(1);
    }
    return 0;
}

Thanks a lot.
-Qiao
-

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-07-07  7:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-27  6:50 [Question] The system may be stuck if there is a cpu cgroup cpu.cfs_quato_us is very low Zhang Qiao
2022-06-27  6:50 ` Zhang Qiao
     [not found] ` <5987be34-b527-4ff5-a17d-5f6f0dc94d6d-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2022-06-27  8:32   ` Tejun Heo
2022-06-27  8:32     ` Tejun Heo
     [not found]     ` <YrlrBmF3oOfS3+fq-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2022-07-01  7:34       ` Zhang Qiao
2022-07-01  7:34         ` Zhang Qiao
     [not found]         ` <f0f55f89-14db-de29-c182-32539f8d4e4d-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2022-07-01 20:08           ` Benjamin Segall
2022-07-01 20:08             ` Benjamin Segall
     [not found]             ` <xm26czeoioju.fsf-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2022-07-01 20:15               ` Tejun Heo
2022-07-01 20:15                 ` Tejun Heo
     [not found]                 ` <Yr9V755mL6jr20c2-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2022-07-07  6:59                   ` Zhang Qiao
2022-07-07  6:59                     ` Zhang Qiao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.