From: Oleg Nesterov <oleg@redhat.com>
To: Roman Gushchin <guro@fb.com>
Cc: Roman Gushchin <guroan@gmail.com>, Tejun Heo <tj@kernel.org>,
Kernel Team <Kernel-team@fb.com>,
"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
Date: Wed, 24 Apr 2019 17:46:19 +0200 [thread overview]
Message-ID: <20190424154619.GG16167@redhat.com> (raw)
In-Reply-To: <20190422221116.GA10341@tower.DHCP.thefacebook.com>
On 04/22, Roman Gushchin wrote:
>
> > > Hm, it might work too, but I'm not sure I like it more. IMO, the best option
> > > is to have a single cgroup_leave_frozen(true) in signal.c, it's just simpler.
> > > If a user changed the desired state of cgroup twice, there is no need to avoid
> > > state transitions. Or maybe I don't see it yet.
> >
> > Then why do we need cgroup_leave_frozen(false) in wait_for_vfork_done() ? How
> > does it differ from get_signal() ?
>
> We need it because sleeping in vfork is a special state which we want to
> account as frozen. And if the parent process wakes up while the cgroup is frozen
> (because of the child death, for example), we want to push it into the "proper"
> frozen state without changing the state of the cgroup.
Again, I do not see how vfork() differs from get_signal() in this respect.
Let me provide another example. A TASK_STOPPED task reacts to SIGCONT and
returns to get_signal(), current->frozen is true.
If this races with CGRP_FREEZE, the task should not return to user-space,
just like vfork(). I see no difference.
They differ in that wait_for_vfork_done() should guarentee TIF_SIGPENDING
in this case, but this is another story...
>
> > If nothing else. Suppose that wait_for_vfork_done() calls leave(false) and this
> > races with freezer, CGRP_FREEZE is already set but JOBCTL_TRAP_FREEZE is not.
> >
> > This sets TIF_SIGPENDING to ensure the task won't return to user mode, thus it
> > calls get_signal().
> >
> > get_signal() doesn't see JOBCTL_TRAP_FREEZE, it notices ->frozen == T and does
> > cgroup_leave_frozen(true) which clears ->frozen.
> >
> > Then the task calls dequeue_signal(), clears TIF_SIGPENDING and returns to user
> > mode?
>
> Got it, a good catch! So if the freezer races with vfork() completion, we might
> have a spurious frozen->unfrozen->frozen transition of the cgroup state.
>
> Switching to cgroup_leave_frozen(false) seems to solve it, but I'm slightly
> concerned that we're basically putting the task in a busy loop between
> the setting CGRP_FREEZE and setting TRAP_FREEZE.
Yes, yes. Didn't I say I dislike the new ->frozen check in recalc() ? ;)
OK, how about the ABSOLUTELY UNTESTED patch below? For the start.
Oleg.
diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/freezer.c
index 3bfbb3c..b5ccc87 100644
--- a/kernel/cgroup/freezer.c
+++ b/kernel/cgroup/freezer.c
@@ -132,26 +132,21 @@ void cgroup_leave_frozen(bool always_leave)
{
struct cgroup *cgrp;
+ WARN_ON_ONCE(!current->frozen);
+
spin_lock_irq(&css_set_lock);
cgrp = task_dfl_cgroup(current);
if (always_leave || !test_bit(CGRP_FREEZE, &cgrp->flags)) {
cgroup_dec_frozen_cnt(cgrp);
cgroup_update_frozen(cgrp);
- WARN_ON_ONCE(!current->frozen);
current->frozen = false;
+ } else if (!(current->jobctl & JOBCTL_TRAP_FREEZE)) {
+ spin_lock(¤t->sighand->siglock);
+ current->jobctl |= JOBCTL_TRAP_FREEZE;
+ set_thread_flag(TIF_SIGPENDING);
+ spin_unlock(¤t->sighand->siglock);
}
spin_unlock_irq(&css_set_lock);
-
- if (unlikely(current->frozen)) {
- /*
- * If the task remained in the frozen state,
- * make sure it won't reach userspace without
- * entering the signal handling loop.
- */
- spin_lock_irq(¤t->sighand->siglock);
- recalc_sigpending();
- spin_unlock_irq(¤t->sighand->siglock);
- }
}
/*
diff --git a/kernel/signal.c b/kernel/signal.c
index e46d527..155b273 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2515,7 +2515,7 @@ bool get_signal(struct ksignal *ksig)
*/
if (unlikely(cgroup_task_frozen(current))) {
spin_unlock_irq(&sighand->siglock);
- cgroup_leave_frozen(true);
+ cgroup_leave_frozen(false);
goto relock;
}
next prev parent reply other threads:[~2019-04-24 15:46 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-05 17:46 [PATCH v10 0/9] freezer for cgroup v2 Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 1/9] cgroup: rename freezer.c into legacy_freezer.c Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 2/9] cgroup: implement __cgroup_task_count() helper Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 3/9] cgroup: protect cgroup->nr_(dying_)descendants by css_set_lock Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 4/9] cgroup: cgroup v2 freezer Roman Gushchin
2019-04-19 15:19 ` Oleg Nesterov
2019-04-19 16:08 ` Oleg Nesterov
2019-04-19 16:36 ` Roman Gushchin
2019-04-19 16:11 ` Roman Gushchin
2019-04-19 16:26 ` Oleg Nesterov
2019-04-19 16:56 ` Roman Gushchin
2019-04-20 10:58 ` Oleg Nesterov
2019-04-22 22:11 ` Roman Gushchin
2019-04-24 15:46 ` Oleg Nesterov [this message]
2019-04-24 22:06 ` Roman Gushchin
2019-04-26 17:40 ` Oleg Nesterov
2019-04-24 16:02 ` Oleg Nesterov
2019-04-24 22:10 ` Roman Gushchin
2019-04-26 17:30 ` Oleg Nesterov
2019-04-05 17:47 ` [PATCH v10 5/9] kselftests: cgroup: don't fail on cg_kill_all() error in cg_destroy() Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 6/9] kselftests: cgroup: add freezer controller self-tests Roman Gushchin
2019-07-16 14:48 ` Naresh Kamboju
2019-07-17 0:49 ` Roman Gushchin
2019-07-17 8:56 ` Naresh Kamboju
2019-07-18 18:19 ` Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 7/9] cgroup: make TRACE_CGROUP_PATH irq-safe Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 8/9] cgroup: add tracing points for cgroup v2 freezer Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 9/9] cgroup: document cgroup v2 freezer interface Roman Gushchin
2019-04-19 18:29 ` [PATCH v10 0/9] freezer for cgroup v2 Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190424154619.GG16167@redhat.com \
--to=oleg@redhat.com \
--cc=Kernel-team@fb.com \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=guroan@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).