All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Chen Ridong <chenridong@huaweicloud.com>
Cc: Tejun Heo <tj@kernel.org>,
	akpm@linux-foundation.org, hannes@cmpxchg.org,
	yosryahmed@google.com, roman.gushchin@linux.dev,
	shakeel.butt@linux.dev, muchun.song@linux.dev, davidf@vimeo.com,
	vbabka@suse.cz, handai.szj@taobao.com, rientjes@google.com,
	kamezawa.hiroyu@jp.fujitsu.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	chenridong@huawei.com, wangweiyang2@huawei.com
Subject: Re: [PATCH v1] memcg: fix soft lockup in the OOM process
Date: Wed, 18 Dec 2024 11:22:42 +0100	[thread overview]
Message-ID: <Z2KichB-NayQbzmd@tiehlicka> (raw)
In-Reply-To: <02f7d744-f123-4523-b170-c2062b5746c8@huaweicloud.com>

On Wed 18-12-24 17:00:38, Chen Ridong wrote:
> 
> 
> On 2024/12/18 15:56, Michal Hocko wrote:
> > On Wed 18-12-24 15:44:34, Chen Ridong wrote:
> >>
> >>
> >> On 2024/12/17 20:54, Michal Hocko wrote:
> >>> On Tue 17-12-24 12:18:28, Chen Ridong wrote:
> >>> [...]
> >>>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> >>>> index 1c485beb0b93..14260381cccc 100644
> >>>> --- a/mm/oom_kill.c
> >>>> +++ b/mm/oom_kill.c
> >>>> @@ -390,6 +390,7 @@ static int dump_task(struct task_struct *p, void *arg)
> >>>>  	if (!is_memcg_oom(oc) && !oom_cpuset_eligible(p, oc))
> >>>>  		return 0;
> >>>>  
> >>>> +	cond_resched();
> >>>>  	task = find_lock_task_mm(p);
> >>>>  	if (!task) {
> >>>>  		/*
> >>>
> >>> This is called from RCU read lock for the global OOM killer path and I
> >>> do not think you can schedule there. I do not remember specifics of task
> >>> traversal for crgoup path but I guess that you might need to silence the
> >>> soft lockup detector instead or come up with a different iteration
> >>> scheme.
> >>
> >> Thank you, Michal.
> >>
> >> I made a mistake. I added cond_resched in the mem_cgroup_scan_tasks
> >> function below the fn, but after reconsideration, it may cause
> >> unnecessary scheduling for other callers of mem_cgroup_scan_tasks.
> >> Therefore, I moved it into the dump_task function. However, I missed the
> >> RCU lock from the global OOM.
> >>
> >> I think we can use touch_nmi_watchdog in place of cond_resched, which
> >> can silence the soft lockup detector. Do you think that is acceptable?
> > 
> > It is certainly a way to go. Not the best one at that though. Maybe we
> > need different solution for the global and for the memcg OOMs. During
> > the global OOM we rarely care about latency as the whole system is
> > likely to struggle. Memcg ooms are much more likely. Having that many
> > tasks in a memcg certainly requires a further partitioning so if
> > configured properly the OOM latency shouldn't be visible much. But I am
> > wondering whether the cgroup task iteration could use cond_resched while
> > the global one would touch_nmi_watchdog for every N iterations. I might
> > be missing something but I do not see any locking required outside of
> > css_task_iter_*.
> 
> Do you mean like that:

I've had something like this (untested) in mind
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7b3503d12aaf..37abc94abd2e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1167,10 +1167,14 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
 	for_each_mem_cgroup_tree(iter, memcg) {
 		struct css_task_iter it;
 		struct task_struct *task;
+		unsigned int i = 0
 
 		css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it);
-		while (!ret && (task = css_task_iter_next(&it)))
+		while (!ret && (task = css_task_iter_next(&it))) {
 			ret = fn(task, arg);
+			if (++i % 1000)
+				cond_resched();
+		}
 		css_task_iter_end(&it);
 		if (ret) {
 			mem_cgroup_iter_break(memcg, iter);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1c485beb0b93..3bf2304ed20c 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -430,10 +430,14 @@ static void dump_tasks(struct oom_control *oc)
 		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
 	else {
 		struct task_struct *p;
+		unsigned int i = 0;
 
 		rcu_read_lock();
-		for_each_process(p)
+		for_each_process(p) {
+			if (++i % 1000)
+				touch_softlockup_watchdog();
 			dump_task(p, oc);
+		}
 		rcu_read_unlock();
 	}
 }
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2024-12-18 10:22 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-17 12:18 [PATCH v1] memcg: fix soft lockup in the OOM process Chen Ridong
2024-12-17 12:54 ` Michal Hocko
2024-12-18  7:44   ` Chen Ridong
2024-12-18  7:56     ` Michal Hocko
2024-12-18  9:00       ` Chen Ridong
2024-12-18 10:22         ` Michal Hocko [this message]
2024-12-19  1:27           ` Chen Ridong
2024-12-19  7:57             ` Michal Hocko
2024-12-20 10:44               ` Chen Ridong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z2KichB-NayQbzmd@tiehlicka \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chenridong@huawei.com \
    --cc=chenridong@huaweicloud.com \
    --cc=davidf@vimeo.com \
    --cc=handai.szj@taobao.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=wangweiyang2@huawei.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.