netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zefan Li <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
To: Dimitri John Ledkov
	<dimitri.j.ledkov-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	<cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH 1/2] connector: add cgroup release event report to proc connector
Date: Thu, 28 May 2015 11:30:49 +0800	[thread overview]
Message-ID: <55668BE9.1090100@huawei.com> (raw)
In-Reply-To: <CAK7xexkFm_BZ9sdB3VAZ-WGiikBeAQfaimXjggEtac+M2diV-g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 2015/5/27 20:37, Dimitri John Ledkov wrote:
> On 27 May 2015 at 12:22, Zefan Li <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> wrote:
>> On 2015/5/27 6:07, Dimitri John Ledkov wrote:
>>> Add a kernel API to send a proc connector notification that a cgroup
>>> has become empty. A userspace daemon can then act upon such
>>> information, and usually clean-up and remove such a group as it's no
>>> longer needed.
>>>
>>> Currently there are two other ways (one for current & one for unified
>>> cgroups) to receive such notifications, but they either involve
>>> spawning userspace helper or monitoring a lot of files. This is a
>>> firehose of all such events instead from a single place.
>>>
>>> In the current cgroups structure the way to get notifications is by
>>> enabling `release_agent' and setting `notify_on_release' for a given
>>> cgroup hierarchy. This will then spawn userspace helper with removed
>>> cgroup as an argument. It has been acknowledged that this is
>>> expensive, especially in the exit-heavy workloads. In userspace this
>>> is currently used by systemd and CGmanager that I know of, both of
>>> agents establish connection to the long running daemon and pass the
>>> message to it. As a courtesy to other processes, such an event is
>>> sometimes forwarded further on, e.g. systemd forwards it to the system
>>> DBus.
>>>
>>> In the future/unified cgroups structure support for `release_agent' is
>>> removed, without a direct replacement. However, there is a new
>>> `cgroup.populated' file exposed that recursively reports if there are
>>> any tasks in a given cgroup hierarchy. It's a very good flag to
>>> quickly/lazily scan for empty things, however one would need to
>>> establish inotify watch on each and every cgroup.populated file at
>>> cgroup setup time (ideally before any pids enter said cgroup). Thus
>>> again anybody else, but the original creator of a given cgroup, has a
>>> chance to reliably monitor cgroup becoming empty (since there is no
>>> reliable recursive inotify watch).
>>>
>>> Hence, the addition to the proc connector firehose. Multiple things,
>>> albeit with a CAP_NET_ADMIN in the init pid/user namespace), could
>>> connect and monitor cgroups release notifications. In a way, this
>>> repeats udev history, at first it was a userspace helper, which later
>>> became a netlink socket. And I hope, that proc connector is a
>>> naturally good fit for this notification type.
>>>
>>> For precisely when cgroups should emit this event, see next patch
>>> against kernel/cgroup.c.
>>>
>>
>> We really don't want yet another way for cgroup notification.
>>
> 
> we do have multiple information sources for similar events in other
> places... e.g. fork events can be tracked with ptrace and with
> proc-connector, ditto other things.
> 
>> Systemd is happy with this cgroup.populated interface. Do you have any
>> real use case in mind that can't be satisfied with inotify watch?
>>
> 
> cgroup.populated is not implemented in systemd and would require a lot
> of inotify watches.

I believe systemd will use cgroup.populated, though I don't know its
roadmap. Maybe it's waiting for the kernel to remove the experimental
flag of unified hierarchy.

> Also it's only set on the unified structure and
> not exposed on the current one.
> 
> Also it will not allow anybody else to establish notify watch in a
> timely manner. Thus anyone external to the cgroups creator will not be
> able to monitor cgroup.populated at the right time.

I guess this isn't a problem, as you can watch the IN_CREATE event, and
then you'll get notified when a cgroup is created.

> With
> proc_connector I was thinking processes entering cgroups would be
> useful events as well, but I don't have a use-case for them yet thus
> I'm not sure how the event should look like.
> 
> Would cgroup.populated be exposed on the legacy cgroup hierchy? At the
> moment I see about ~20ms of my ~200ms boot wasted on spawning the
> cgroups agent and I would like to get rid of that as soon as possible.
> This patch solves it for me. ( i have a matching one to connect to
> proc connector and then feed notifications to systemd via systemd's
> private api end-point )
> 
> Exposing cgroup.populated irrespective of the cgroup mount options
> would be great, but would result in many watches being established
> awaiting for a once in a lifecycle condition of a cgroup. Imho this is
> wasteful, but nonetheless will be much better than spawning the agent.
> 

Each inotify watch will consume a little memory, which should be
acceptable.

> Would a patch that exposes cgroup.populated on legacy cgroup structure
> be accepted? It is forward-compatible afterall... or no?
> 

I'm afraid no...All new features are done in unified hiearchy, and we've
been restraining from adding them to the legacy hierarchy.

  parent reply	other threads:[~2015-05-28  3:30 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-26 22:07 [PATCH 1/2] connector: add cgroup release event report to proc connector Dimitri John Ledkov
     [not found] ` <1432678051-4372-1-git-send-email-dimitri.j.ledkov-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2015-05-26 22:07   ` [PATCH 2/2] cgroup: report cgroup release event " Dimitri John Ledkov
2015-05-27 11:22   ` [PATCH 1/2] connector: add cgroup release event report " Zefan Li
     [not found]     ` <5565A8F9.8040601-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2015-05-27 12:37       ` Dimitri John Ledkov
     [not found]         ` <CAK7xexkFm_BZ9sdB3VAZ-WGiikBeAQfaimXjggEtac+M2diV-g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-05-28  3:30           ` Zefan Li [this message]
     [not found]             ` <55668BE9.1090100-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2015-05-28  8:54               ` Dimitri John Ledkov
     [not found]                 ` <CAK7xexmATyoUsOb5cgHF-Pz91ihQHm47sJoXmi15fk87yBJjNw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-06-11 15:28                   ` Evgeniy Polyakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55668BE9.1090100@huawei.com \
    --to=lizefan-hv44wf8li93qt0dzr+alfa@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=dimitri.j.ledkov-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).