Re: [PATCH v5 4/7] cgroup: cgroup v2 freezer

linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Roman Gushchin <guro@fb.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Roman Gushchin <guroan@gmail.com>, Tejun Heo <tj@kernel.org>,
	"Dan Carpenter" <dan.carpenter@oracle.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH v5 4/7] cgroup: cgroup v2 freezer
Date: Tue, 18 Dec 2018 01:28:04 +0000	[thread overview]
Message-ID: <20181218012800.GA29563@tower.DHCP.thefacebook.com> (raw)
In-Reply-To: <20181212174902.GA30309@redhat.com>

On Wed, Dec 12, 2018 at 06:49:02PM +0100, Oleg Nesterov wrote:
> On 12/11, Roman Gushchin wrote:
> >
> > On Tue, Dec 11, 2018 at 05:26:32PM +0100, Oleg Nesterov wrote:
> > > On 12/07, Roman Gushchin wrote:
> > > >
> > > > Cgroup v2 freezer tries to put tasks into a state similar to jobctl
> > > > stop. This means that tasks can be killed, ptraced (using
> > > > PTRACE_SEIZE*), and interrupted. It is possible to attach to
> > > > a frozen task, get some information (e.g. read registers) and detach.
> > >
> > > I fail to understand how this all supposed to work.
> > >
> > > > @@ -368,6 +369,8 @@ static inline int signal_pending_state(long state, struct task_struct *p)
> > > >  		return 0;
> > > >  	if (!signal_pending(p))
> > > >  		return 0;
> > > > +	if (unlikely(cgroup_task_frozen(p) && p->jobctl == JOBCTL_TRAP_FREEZE))
> > > > +		return __fatal_signal_pending(p);
> > >
> > > I think I will never agree with this change ;) and I don't think it actually helps.
> >
> > See below.
> >
> > >
> > > > +void cgroup_enter_frozen(void)
> > > > +{
> > > > +	if (!current->frozen) {
> > > > +		spin_lock_irq(&css_set_lock);
> > > > +		current->frozen = true;
> > > > +		cgroup_inc_frozen_cnt(task_dfl_cgroup(current), false, true);
> > > > +		spin_unlock_irq(&css_set_lock);
> > > > +	}
> > > > +
> > > > +	__set_current_state(TASK_INTERRUPTIBLE);
> > > > +	schedule();
> > >
> > > So once again, suppose it races with PTRACE_INTERRUPT, or SIGSTOP, or something
> > > else which should be handled by get_signal() before do_freezer_trap().
> > >
> > > If (say) PTRACE_INTERRUPT comes before schedule it will be lost. Otherwise
> > > the frozen task will react. This can't be right. Or I am totally confused.
> >
> > Why?
> > PTRACE_INTERRUPT will set JOBCTL_TRAP_STOP, so signal_pending_state()
> > will return true, schedule() will return immediately, and we'll handle the trap.
> 
> OK, I misread the JOBCTL_TRAP_FREEZE check as "jobctl & JOBCTL_TRAP_FREEZE".
> 
> But p->jobctl == JOBCTL_TRAP_FREEZE doesn't look right too. For example,
> JOBCTL_STOP_DEQUEUED can be set. You probably need something like
> 
> 	jobctl & (JOBCTL_PENDING_MASK | JOBCTL_TRAP_FREEZE) == JOBCTL_TRAP_FREEZE
> 
> And you need a barrier in between, iow you need set_current_state(TASK_INTERRUPTIBLE).
> 
> But this doesn't really matter. I don't think you need to modify signal_pending_state()
> and penalize schedule(). You can do something like
> 
> 	spin_lock_irq(sigllock);
> 	if (jobctl & (JOBCTL_PENDING_MASK | JOBCTL_TRAP_FREEZE) == JOBCTL_TRAP_FREEZE &&
> 	    !__fatal_signal_pending())
> 	{
> 		__set_current_state(TASK_INTERRUPTIBLE);
> 		clear_thread_flag(TIF_SIGPENDING);
> 	}
> 	spin_unlock_irq(siglock);
> 
> 	schedule();
> 	// recalc_sigpending() is not needed
> 
> in cgroup_enter_frozen() with the same effect. Which looks equally ugly and
> suboptimal, but at least this doesn't touch the sched code.

Gotcha. Will follow this approach in v6.

> 
> > > and btw.... what about suspend? try_to_freeze_tasks() will obviously fail
> > > if there is a ->frozen thread?
> >
> > I have to think a bit more here, but something like this will probably work:
> >
> > diff --git a/kernel/freezer.c b/kernel/freezer.c
> > index b162b74611e4..590ac4d10b02 100644
> > --- a/kernel/freezer.c
> > +++ b/kernel/freezer.c
> > @@ -134,7 +134,7 @@ bool freeze_task(struct task_struct *p)
> >                 return false;
> >
> >         spin_lock_irqsave(&freezer_lock, flags);
> > -       if (!freezing(p) || frozen(p)) {
> > +       if (!freezing(p) || frozen(p) || cgroup_task_frozen()) {
> >                 spin_unlock_irqrestore(&freezer_lock, flags);
> >                 return false;
> >         }
> >
> > --
> >
> > If the task is already frozen by the cgroup freezer, we don't have to do
> > anything additionally.
> 
> I don't think so. A cgroup_task_frozen() task can be killed after
> try_to_freeze_tasks() succeeds, and the exiting task can close files,
> do IO, etc. Or it can be thawed by cgroup_freeze_task(false).
> 
> In short, if try_to_freeze_tasks() succeeds, the caller has all rights
> to assume that nobody can escape from __refrigerator().

But this is what we do with stopped and ptraced tasks, isn't it?
We do use freezable_schedule() and the system freezer just ignores such tasks.
I believe that cgroup v2 freezer should follow the same path.

> 
> And what about TASK_STOPPED/TASK_TRACED tasks? They can not be frozen
> or thawed, right? This doesn't look good, and this differs from the
> current freezer controller...

Good question!

It looks like cgroup v1 freezer just ignores them treating as already frozen,
which doesn't look nice.

I'd say s/signal_wake_up(task, 0)/signal_wake_up(task, 1) in
cgroup_freeze_task() will do the job of moving them into the frozen state.
The question is how to get them back into the stopped state, if cgroup is
unfrozen. At this point there are no more signs, that the task has been
previously frozen. I've no better idea, than to introduce another
per-task bit/flag. If you have any better ideas, please, share.

Thank you for the review!

next prev parent reply	other threads:[~2018-12-18  1:28 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-07 20:15 [PATCH v5 0/7] freezer for cgroup v2 Roman Gushchin
2018-12-07 20:15 ` [PATCH v5 1/7] cgroup: rename freezer.c into legacy_freezer.c Roman Gushchin
2018-12-07 20:15 ` [PATCH v5 2/7] cgroup: implement __cgroup_task_count() helper Roman Gushchin
2018-12-07 20:15 ` [PATCH v5 3/7] cgroup: protect cgroup->nr_(dying_)descendants by css_set_lock Roman Gushchin
2018-12-07 20:15 ` [PATCH v5 4/7] cgroup: cgroup v2 freezer Roman Gushchin
2018-12-11 16:26   ` Oleg Nesterov
2018-12-11 18:40     ` Roman Gushchin
2018-12-12 17:49       ` Oleg Nesterov
2018-12-18  1:28         ` Roman Gushchin [this message]
2018-12-18 17:12           ` Oleg Nesterov
2018-12-18 20:27             ` Roman Gushchin
2018-12-20 16:16               ` Oleg Nesterov
2018-12-20 21:43                 ` Roman Gushchin
2018-12-07 20:15 ` [PATCH v5 5/7] kselftests: cgroup: don't fail on cg_kill_all() error in cg_destroy() Roman Gushchin
2018-12-07 20:15 ` [PATCH v5 6/7] kselftests: cgroup: add freezer controller self-tests Roman Gushchin
2018-12-07 20:15 ` [PATCH v5 7/7] cgroup: document cgroup v2 freezer interface Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181218012800.GA29563@tower.DHCP.thefacebook.com \
    --to=guro@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=cgroups@vger.kernel.org \
    --cc=dan.carpenter@oracle.com \
    --cc=guroan@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).