public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Aleksa Sarai <cyphar-gVpy/LI/lHzQT0dZR+AlfA@public.gmane.org>
Cc: lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org,
	mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
	richard-/L3Ra7n9ekc@public.gmane.org,
	fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v10 4/4] cgroups: implement the PIDs subsystem
Date: Wed, 22 Apr 2015 12:29:54 -0400	[thread overview]
Message-ID: <20150422162954.GF10738@htj.duckdns.org> (raw)
In-Reply-To: <1429446154-10660-5-git-send-email-cyphar-gVpy/LI/lHzQT0dZR+AlfA@public.gmane.org>

> @@ -0,0 +1,368 @@
> +/*
> + * Process number limiting controller for cgroups.
> + *
> + * Used to allow a cgroup hierarchy to stop any new processes
> + * from fork()ing after a certain limit is reached.
> + *
> + * Since it is trivial to hit the task limit without hitting
> + * any kmemcg limits in place, PIDs are a fundamental resource.
> + * As such, PID exhaustion must be preventable in the scope of
> + * a cgroup hierarchy by allowing resource limiting of the
> + * number of tasks in a cgroup.
> + *
> + * In order to use the `pids` controller, set the maximum number
> + * of tasks in pids.max (this is not available in the root cgroup
> + * for obvious reasons). The number of processes currently
> + * in the cgroup is given by pids.current. Organisational operations
> + * are not blocked by cgroup policies, so it is possible to have
> + * pids.current > pids.max. However, fork()s will still not work.
> + *
> + * To set a cgroup to have no limit, set pids.max to "max". fork()
> + * will return -EBUSY if forking would cause a cgroup policy to be
> + * violated.
> + *
> + * pids.current tracks all child cgroup hierarchies, so
> + * parent/pids.current is a superset of parent/child/pids.current.
> + *
> + * Copyright (C) 2015 Aleksa Sarai <cyphar-gVpy/LI/lHzQT0dZR+AlfA@public.gmane.org>

The above text looks wrapped too narrow.

> +struct pids_cgroup {
> +	struct cgroup_subsys_state	css;
> +
> +	/*
> +	 * Use 64-bit types so that we can safely represent "max" as
> +	 * (PID_MAX_LIMIT + 1).
            ^^^^^^^^^^^^^^^^^
...
> +static struct cgroup_subsys_state *
> +pids_css_alloc(struct cgroup_subsys_state *parent)
> +{
> +	struct pids_cgroup *pids;
> +
> +	pids = kzalloc(sizeof(struct pids_cgroup), GFP_KERNEL);
> +	if (!pids)
> +		return ERR_PTR(-ENOMEM);
> +
> +	pids->limit = PIDS_MAX;
                      ^^^^^^^^^

> +	atomic64_set(&pids->counter, 0);
> +	return &pids->css;
> +}
...
> +static void pids_detach(struct cgroup_subsys_state *old_css,
> +			struct task_struct *task)
> +{
> +	struct pids_cgroup *old_pids = css_pids(old_css);
> +
> +	pids_uncharge(old_pids, 1);
> +}

You can do the above as a part of can/cancel.

> +static int pids_can_fork(struct task_struct *task, void **private)

Maybe @priv_p or something which signifies it's of different type from
others?

> +{
...
> +	rcu_read_lock();
> +	css = task_css(current, pids_cgrp_id);
> +	if (!css_tryget_online(css)) {
> +		retval = -EBUSY;
> +		goto err_rcu_unlock;
> +	}
> +	rcu_read_unlock();

Hmmm... so, the above is guaranteed to succeed in finite amount of
time (the race window is actually very narrow) and it'd be silly to
fail fork because a task was being moved across cgroups.

I think it'd be a good idea to implement task_get_css() which loops
and returns the current css for the requested subsystem with reference
count bumped and it can use css_tryget() too.  Holding a ref doesn't
prevent css from dying anyway, so it doesn't make any difference.

> +static void pids_fork(struct task_struct *task, void *private)
> +{
...
> +	rcu_read_lock();
> +	css = task_css(task, pids_cgrp_id);
> +	css_get(css);

Why is this safe?  What guarantees that css's ref isn't already zero
at this point?

> +	rcu_read_unlock();
> +
> +	pids = css_pids(css);
> +
> +	/*
> +	 * The association has changed, we have to revert and reapply the
> +	 * charge/uncharge on the wrong hierarchy to the current one. Since
> +	 * the association can only change due to an organisation event, its
> +	 * okay for us to ignore the limit in this case.
> +	 */
> +	if (pids != old_pids) {
> +		pids_uncharge(old_pids, 1);
> +		pids_charge(pids, 1);
> +	}
> +
> +	css_put(css);
> +	css_put(old_css);
> +}
...
> +static ssize_t pids_max_write(struct kernfs_open_file *of, char *buf,
> +			      size_t nbytes, loff_t off)
> +{
> +	struct cgroup_subsys_state *css = of_css(of);
> +	struct pids_cgroup *pids = css_pids(css);
> +	int64_t limit;
> +	int err;
> +
> +	buf = strstrip(buf);
> +	if (!strcmp(buf, PIDS_MAX_STR)) {
> +		limit = PIDS_MAX;
> +		goto set_limit;
> +	}
> +
> +	err = kstrtoll(buf, 0, &limit);
> +	if (err)
> +		return err;
> +
> +	/* We use INT_MAX as the maximum value of pid_t. */
> +	if (limit < 0 || limit > INT_MAX)

This is kinda weird if we're using PIDS_MAX for max as it may end up
showing "max" after some larger number is written to the file.

Thanks.

-- 
tejun

  parent reply	other threads:[~2015-04-22 16:29 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-19 12:22 [PATCH v10 0/4] cgroups: add pids subsystem Aleksa Sarai
     [not found] ` <1429446154-10660-1-git-send-email-cyphar-gVpy/LI/lHzQT0dZR+AlfA@public.gmane.org>
2015-04-19 12:22   ` [PATCH v10 1/4] cgroups: use bitmask to filter for_each_subsys Aleksa Sarai
2015-04-22 15:25     ` Tejun Heo
2015-04-22 15:42       ` Peter Zijlstra
2015-04-22 16:02         ` Tejun Heo
2015-04-26 16:05           ` Aleksa Sarai
     [not found]             ` <CAOviyagvVTKjRp4wxyNo4Oke-w=Wbda+UyHAN-ih9yFX-8jqmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-04-26 16:09               ` Tejun Heo
     [not found]                 ` <20150426160909.GC1499-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2015-05-13  5:44                   ` Aleksa Sarai
     [not found]                     ` <CAOviyah2kjtaRBFRcd3cAb1DeYX11Ks_KVfwXPMXK5BENSqOiA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-13 13:50                       ` Tejun Heo
2015-04-22 15:30     ` Tejun Heo
2015-04-19 12:22   ` [PATCH v10 3/4] cgroups: allow a cgroup subsystem to reject a fork Aleksa Sarai
     [not found]     ` <1429446154-10660-4-git-send-email-cyphar-gVpy/LI/lHzQT0dZR+AlfA@public.gmane.org>
2015-04-22 15:54       ` Tejun Heo
2015-04-24 13:59         ` Aleksa Sarai
2015-04-24 15:48           ` Tejun Heo
2015-05-14 10:57         ` Aleksa Sarai
2015-05-14 15:08           ` Tejun Heo
2015-04-19 12:22   ` [PATCH v10 4/4] cgroups: implement the PIDs subsystem Aleksa Sarai
     [not found]     ` <1429446154-10660-5-git-send-email-cyphar-gVpy/LI/lHzQT0dZR+AlfA@public.gmane.org>
2015-04-22 16:29       ` Tejun Heo [this message]
     [not found]         ` <20150422162954.GF10738-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2015-04-23  0:43           ` Aleksa Sarai
     [not found]             ` <CAOviyagHJhxD8E+CeEdy399ARPaNyyiSJSJjByK=5ALN5jxbJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-04-24 15:36               ` Tejun Heo
     [not found]                 ` <20150424153657.GC24029-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2015-05-13 17:04                   ` Aleksa Sarai
     [not found]                     ` <CAOviyah9yyXJG0eUeizpb5ZnzNUAKfxXGkXJw-5sNw7sQOAEuA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-13 17:29                       ` Tejun Heo
2015-05-13 17:44                         ` Aleksa Sarai
     [not found]                           ` <CAOviyaimfHD=Jg1+FEwyapwO=zdaL0bvq8YDA1Cnq90YbfGENQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-13 17:47                             ` Tejun Heo
     [not found]                               ` <20150513174707.GA11388-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2015-05-16  3:59                                 ` Aleksa Sarai
     [not found]                                   ` <CAOviyaiJnxmiRPEGFutEXiV-D5XwJUhwRJ1w8P3Yn2QLGO9bbw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-18  1:24                                     ` Tejun Heo
2015-04-24 14:24           ` Aleksa Sarai
2015-04-19 12:22 ` [PATCH v10 2/4] cgroups: replace explicit ss_mask checking with for_each_subsys_which Aleksa Sarai
     [not found]   ` <1429446154-10660-3-git-send-email-cyphar-gVpy/LI/lHzQT0dZR+AlfA@public.gmane.org>
2015-04-22 15:31     ` Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2015-04-24 14:07 [PATCH v10 4/4] cgroups: implement the PIDs subsystem Aleksa Sarai
2015-04-24 15:26 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150422162954.GF10738@htj.duckdns.org \
    --to=tj-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=cyphar-gVpy/LI/lHzQT0dZR+AlfA@public.gmane.org \
    --cc=fweisbec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
    --cc=mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=richard-/L3Ra7n9ekc@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox