From: Oleg Nesterov <oleg@redhat.com>
To: Roman Gushchin <guro@fb.com>
Cc: Roman Gushchin <guroan@gmail.com>, Tejun Heo <tj@kernel.org>,
Kernel Team <Kernel-team@fb.com>,
"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v10 4/9] cgroup: cgroup v2 freezer
Date: Wed, 24 Apr 2019 17:46:19 +0200 [thread overview]
Message-ID: <20190424154619.GG16167@redhat.com> (raw)
In-Reply-To: <20190422221116.GA10341@tower.DHCP.thefacebook.com>
On 04/22, Roman Gushchin wrote:
>
> > > Hm, it might work too, but I'm not sure I like it more. IMO, the best option
> > > is to have a single cgroup_leave_frozen(true) in signal.c, it's just simpler.
> > > If a user changed the desired state of cgroup twice, there is no need to avoid
> > > state transitions. Or maybe I don't see it yet.
> >
> > Then why do we need cgroup_leave_frozen(false) in wait_for_vfork_done() ? How
> > does it differ from get_signal() ?
>
> We need it because sleeping in vfork is a special state which we want to
> account as frozen. And if the parent process wakes up while the cgroup is frozen
> (because of the child death, for example), we want to push it into the "proper"
> frozen state without changing the state of the cgroup.
Again, I do not see how vfork() differs from get_signal() in this respect.
Let me provide another example. A TASK_STOPPED task reacts to SIGCONT and
returns to get_signal(), current->frozen is true.
If this races with CGRP_FREEZE, the task should not return to user-space,
just like vfork(). I see no difference.
They differ in that wait_for_vfork_done() should guarentee TIF_SIGPENDING
in this case, but this is another story...
>
> > If nothing else. Suppose that wait_for_vfork_done() calls leave(false) and this
> > races with freezer, CGRP_FREEZE is already set but JOBCTL_TRAP_FREEZE is not.
> >
> > This sets TIF_SIGPENDING to ensure the task won't return to user mode, thus it
> > calls get_signal().
> >
> > get_signal() doesn't see JOBCTL_TRAP_FREEZE, it notices ->frozen == T and does
> > cgroup_leave_frozen(true) which clears ->frozen.
> >
> > Then the task calls dequeue_signal(), clears TIF_SIGPENDING and returns to user
> > mode?
>
> Got it, a good catch! So if the freezer races with vfork() completion, we might
> have a spurious frozen->unfrozen->frozen transition of the cgroup state.
>
> Switching to cgroup_leave_frozen(false) seems to solve it, but I'm slightly
> concerned that we're basically putting the task in a busy loop between
> the setting CGRP_FREEZE and setting TRAP_FREEZE.
Yes, yes. Didn't I say I dislike the new ->frozen check in recalc() ? ;)
OK, how about the ABSOLUTELY UNTESTED patch below? For the start.
Oleg.
diff --git a/kernel/cgroup/freezer.c b/kernel/cgroup/freezer.c
index 3bfbb3c..b5ccc87 100644
--- a/kernel/cgroup/freezer.c
+++ b/kernel/cgroup/freezer.c
@@ -132,26 +132,21 @@ void cgroup_leave_frozen(bool always_leave)
{
struct cgroup *cgrp;
+ WARN_ON_ONCE(!current->frozen);
+
spin_lock_irq(&css_set_lock);
cgrp = task_dfl_cgroup(current);
if (always_leave || !test_bit(CGRP_FREEZE, &cgrp->flags)) {
cgroup_dec_frozen_cnt(cgrp);
cgroup_update_frozen(cgrp);
- WARN_ON_ONCE(!current->frozen);
current->frozen = false;
+ } else if (!(current->jobctl & JOBCTL_TRAP_FREEZE)) {
+ spin_lock(¤t->sighand->siglock);
+ current->jobctl |= JOBCTL_TRAP_FREEZE;
+ set_thread_flag(TIF_SIGPENDING);
+ spin_unlock(¤t->sighand->siglock);
}
spin_unlock_irq(&css_set_lock);
-
- if (unlikely(current->frozen)) {
- /*
- * If the task remained in the frozen state,
- * make sure it won't reach userspace without
- * entering the signal handling loop.
- */
- spin_lock_irq(¤t->sighand->siglock);
- recalc_sigpending();
- spin_unlock_irq(¤t->sighand->siglock);
- }
}
/*
diff --git a/kernel/signal.c b/kernel/signal.c
index e46d527..155b273 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2515,7 +2515,7 @@ bool get_signal(struct ksignal *ksig)
*/
if (unlikely(cgroup_task_frozen(current))) {
spin_unlock_irq(&sighand->siglock);
- cgroup_leave_frozen(true);
+ cgroup_leave_frozen(false);
goto relock;
}
next prev parent reply other threads:[~2019-04-24 15:46 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-05 17:46 [PATCH v10 0/9] freezer for cgroup v2 Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 1/9] cgroup: rename freezer.c into legacy_freezer.c Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 2/9] cgroup: implement __cgroup_task_count() helper Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 3/9] cgroup: protect cgroup->nr_(dying_)descendants by css_set_lock Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 4/9] cgroup: cgroup v2 freezer Roman Gushchin
2019-04-19 15:19 ` Oleg Nesterov
2019-04-19 16:08 ` Oleg Nesterov
2019-04-19 16:36 ` Roman Gushchin
2019-04-19 16:11 ` Roman Gushchin
2019-04-19 16:26 ` Oleg Nesterov
2019-04-19 16:56 ` Roman Gushchin
2019-04-20 10:58 ` Oleg Nesterov
2019-04-22 22:11 ` Roman Gushchin
2019-04-24 15:46 ` Oleg Nesterov [this message]
2019-04-24 22:06 ` Roman Gushchin
2019-04-26 17:40 ` Oleg Nesterov
2019-04-24 16:02 ` Oleg Nesterov
2019-04-24 22:10 ` Roman Gushchin
2019-04-26 17:30 ` Oleg Nesterov
2019-04-05 17:47 ` [PATCH v10 5/9] kselftests: cgroup: don't fail on cg_kill_all() error in cg_destroy() Roman Gushchin
2019-04-05 17:47 ` Roman Gushchin
2019-04-05 17:47 ` guroan
2019-04-05 17:47 ` [PATCH v10 6/9] kselftests: cgroup: add freezer controller self-tests Roman Gushchin
2019-04-05 17:47 ` Roman Gushchin
2019-04-05 17:47 ` guroan
2019-07-16 14:48 ` Naresh Kamboju
2019-07-17 0:49 ` Roman Gushchin
2019-07-17 8:56 ` Naresh Kamboju
2019-07-18 18:19 ` Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 7/9] cgroup: make TRACE_CGROUP_PATH irq-safe Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 8/9] cgroup: add tracing points for cgroup v2 freezer Roman Gushchin
2019-04-05 17:47 ` [PATCH v10 9/9] cgroup: document cgroup v2 freezer interface Roman Gushchin
2019-04-19 18:29 ` [PATCH v10 0/9] freezer for cgroup v2 Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190424154619.GG16167@redhat.com \
--to=oleg@redhat.com \
--cc=Kernel-team@fb.com \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=guroan@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.