Linux Container Development
 help / color / mirror / Atom feed
* systemd-cgroups-agent not working in containers
@ 2014-11-26 21:29 Richard Weinberger
       [not found] ` <54764639.3020100-/L3Ra7n9ekc@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Weinberger @ 2014-11-26 21:29 UTC (permalink / raw)
  To: systemd-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Linux Containers, David Gstir

Hi!

I run a Linux container setup with openSUSE 13.1/2 as guest distro.
After some time containers slow down.
An investigation showed that the containers slow down because a lot of stale
user sessions slow down almost all systemd tools, mostly systemctl.
loginctl reports many thousand sessions.
All in state "closing".

The vast majority of these sessions are from crond an ssh logins.
It turned out that sessions are never closed and stay around.
The control group of a said session contains zero tasks.
So I started to explore why systemd keeps it.
After another few hours of debugging I realized that systemd never
issues the release signal from cgroups.
Also calling the release agent by hand did not help. i.e.
/usr/lib/systemd/systemd-cgroups-agent /user.slice/user-0.slice/session-c324.scope

Therefore systemd never recognizes that a server/session has no more tasks
and will close it.
First I thought it is an issue in libvirt combined with user namespaces.
But I can trigger this also without user namespaces and also with systemd-nspawn.
Tested with systemd 208 and 210 from openSUSE, their packages have all known bugfixes.

Any idea where to look further?
How do you run the most current systemd on your distro?

Thanks,
//richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: systemd-cgroups-agent not working in containers
       [not found] ` <54764639.3020100-/L3Ra7n9ekc@public.gmane.org>
@ 2014-11-27 13:46   ` Richard Weinberger
  2014-11-27 15:44   ` [systemd-devel] " Umut Tezduyar Lindskog
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Richard Weinberger @ 2014-11-27 13:46 UTC (permalink / raw)
  To: systemd-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Linux Containers, David Gstir

Am 26.11.2014 um 22:29 schrieb Richard Weinberger:
> Hi!
> 
> I run a Linux container setup with openSUSE 13.1/2 as guest distro.
> After some time containers slow down.
> An investigation showed that the containers slow down because a lot of stale
> user sessions slow down almost all systemd tools, mostly systemctl.
> loginctl reports many thousand sessions.
> All in state "closing".
> 
> The vast majority of these sessions are from crond an ssh logins.
> It turned out that sessions are never closed and stay around.
> The control group of a said session contains zero tasks.
> So I started to explore why systemd keeps it.
> After another few hours of debugging I realized that systemd never
> issues the release signal from cgroups.
> Also calling the release agent by hand did not help. i.e.
> /usr/lib/systemd/systemd-cgroups-agent /user.slice/user-0.slice/session-c324.scope
> 
> Therefore systemd never recognizes that a server/session has no more tasks
> and will close it.
> First I thought it is an issue in libvirt combined with user namespaces.
> But I can trigger this also without user namespaces and also with systemd-nspawn.
> Tested with systemd 208 and 210 from openSUSE, their packages have all known bugfixes.
> 
> Any idea where to look further?
> How do you run the most current systemd on your distro?

Btw: I face exactly the same issue also on fc21 (guest is fc20).

Thanks,
//richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [systemd-devel] systemd-cgroups-agent not working in containers
       [not found] ` <54764639.3020100-/L3Ra7n9ekc@public.gmane.org>
  2014-11-27 13:46   ` Richard Weinberger
@ 2014-11-27 15:44   ` Umut Tezduyar Lindskog
  2014-11-27 20:26   ` Cameron Norman
  2014-11-30 22:30   ` Lennart Poettering
  3 siblings, 0 replies; 8+ messages in thread
From: Umut Tezduyar Lindskog @ 2014-11-27 15:44 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Linux Containers,
	systemd-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org

Hi,

On Wed, Nov 26, 2014 at 10:29 PM, Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> wrote:
> Hi!
>
> I run a Linux container setup with openSUSE 13.1/2 as guest distro.
> After some time containers slow down.
> An investigation showed that the containers slow down because a lot of stale
> user sessions slow down almost all systemd tools, mostly systemctl.
> loginctl reports many thousand sessions.
> All in state "closing".
>
> The vast majority of these sessions are from crond an ssh logins.
> It turned out that sessions are never closed and stay around.
> The control group of a said session contains zero tasks.
> So I started to explore why systemd keeps it.
> After another few hours of debugging I realized that systemd never
> issues the release signal from cgroups.
> Also calling the release agent by hand did not help. i.e.
> /usr/lib/systemd/systemd-cgroups-agent /user.slice/user-0.slice/session-c324.scope
>
> Therefore systemd never recognizes that a server/session has no more tasks
> and will close it.
> First I thought it is an issue in libvirt combined with user namespaces.
> But I can trigger this also without user namespaces and also with systemd-nspawn.
> Tested with systemd 208 and 210 from openSUSE, their packages have all known bugfixes.
>
> Any idea where to look further?
> How do you run the most current systemd on your distro?
>

I think they are same:
http://lists.freedesktop.org/archives/systemd-devel/2014-October/024482.html
https://bugs.freedesktop.org/show_bug.cgi?id=86520

Umut

> Thanks,
> //richard
> _______________________________________________
> systemd-devel mailing list
> systemd-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> http://lists.freedesktop.org/mailman/listinfo/systemd-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [systemd-devel] systemd-cgroups-agent not working in containers
       [not found] ` <54764639.3020100-/L3Ra7n9ekc@public.gmane.org>
  2014-11-27 13:46   ` Richard Weinberger
  2014-11-27 15:44   ` [systemd-devel] " Umut Tezduyar Lindskog
@ 2014-11-27 20:26   ` Cameron Norman
       [not found]     ` <CALZWFRLVDW2fL_G0rC6mhMsBTPxsxYF1FATA1j-0unCS8WKAsg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
       [not found]     ` <20141128053302.GA2842@piware.de>
  2014-11-30 22:30   ` Lennart Poettering
  3 siblings, 2 replies; 8+ messages in thread
From: Cameron Norman @ 2014-11-27 20:26 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Linux Containers,
	systemd-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
	Martin Pitt

On Wed, Nov 26, 2014 at 1:29 PM, Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> wrote:
> Hi!
>
> I run a Linux container setup with openSUSE 13.1/2 as guest distro.
> After some time containers slow down.
> An investigation showed that the containers slow down because a lot of stale
> user sessions slow down almost all systemd tools, mostly systemctl.
> loginctl reports many thousand sessions.
> All in state "closing".

This sounds similar to an issue that systemd-shim in Debian had.
Martin Pitt (helps to maintain systemd in Debian) fixed that issue; he
may have some ideas here. I CC'd him.

The bug (at Martin's message with most relevant info) for reference:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=756076#75

Best regards,
--
Cameron Norman

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [systemd-devel] systemd-cgroups-agent not working in containers
       [not found]     ` <CALZWFRLVDW2fL_G0rC6mhMsBTPxsxYF1FATA1j-0unCS8WKAsg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-11-28  5:33       ` Martin Pitt
  0 siblings, 0 replies; 8+ messages in thread
From: Martin Pitt @ 2014-11-28  5:33 UTC (permalink / raw)
  To: Cameron Norman
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Richard Weinberger, Linux Containers,
	systemd-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hello all,

Cameron Norman [2014-11-27 12:26 -0800]:
> On Wed, Nov 26, 2014 at 1:29 PM, Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> wrote:
> > Hi!
> >
> > I run a Linux container setup with openSUSE 13.1/2 as guest distro.
> > After some time containers slow down.
> > An investigation showed that the containers slow down because a lot of stale
> > user sessions slow down almost all systemd tools, mostly systemctl.
> > loginctl reports many thousand sessions.
> > All in state "closing".
> 
> This sounds similar to an issue that systemd-shim in Debian had.
> Martin Pitt (helps to maintain systemd in Debian) fixed that issue; he
> may have some ideas here. I CC'd him.

The problem with systemd-shim under sysvinit or upstart was that shim
didn't set a cgroup release agent like systemd itself does. Thus the
cgroups were never cleaned up after all the session processes died.
(See 1.4 on https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt
for details)

I don't think that SUSE uses systemd-shim, I take it in that setup you
are running systemd proper on both the host and the guest? Then I
suggest checking the cgroups that correspond to the "closing" sessions
in the container, i. e. /sys/fs/cgroup/systemd/.../session-XX.scope/tasks.
If there are still processes in it, logind is merely waiting for them
to exit (or set KillUserProcesses in logind.conf). If they are empty,
check that /sys/fs/cgroup/systemd/.../session-XX.scope/notify_on_release is 1
and that /sys/fs/cgroup/systemd/release_agent is set?

Martin

-- 
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [systemd-devel] systemd-cgroups-agent not working in containers
       [not found]       ` <20141128053302.GA2842-TX/5PCBRQDKzQB+pC5nmwQ@public.gmane.org>
@ 2014-11-28 14:52         ` Richard Weinberger
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Weinberger @ 2014-11-28 14:52 UTC (permalink / raw)
  To: Cameron Norman, systemd-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	libvir-list-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Linux Containers

Am 28.11.2014 um 06:33 schrieb Martin Pitt:
> Hello all,
> 
> Cameron Norman [2014-11-27 12:26 -0800]:
>> On Wed, Nov 26, 2014 at 1:29 PM, Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> wrote:
>>> Hi!
>>>
>>> I run a Linux container setup with openSUSE 13.1/2 as guest distro.
>>> After some time containers slow down.
>>> An investigation showed that the containers slow down because a lot of stale
>>> user sessions slow down almost all systemd tools, mostly systemctl.
>>> loginctl reports many thousand sessions.
>>> All in state "closing".
>>
>> This sounds similar to an issue that systemd-shim in Debian had.
>> Martin Pitt (helps to maintain systemd in Debian) fixed that issue; he
>> may have some ideas here. I CC'd him.
> 
> The problem with systemd-shim under sysvinit or upstart was that shim
> didn't set a cgroup release agent like systemd itself does. Thus the
> cgroups were never cleaned up after all the session processes died.
> (See 1.4 on https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt
> for details)
> 
> I don't think that SUSE uses systemd-shim, I take it in that setup you
> are running systemd proper on both the host and the guest? Then I
> suggest checking the cgroups that correspond to the "closing" sessions
> in the container, i. e. /sys/fs/cgroup/systemd/.../session-XX.scope/tasks.
> If there are still processes in it, logind is merely waiting for them
> to exit (or set KillUserProcesses in logind.conf). If they are empty,
> check that /sys/fs/cgroup/systemd/.../session-XX.scope/notify_on_release is 1
> and that /sys/fs/cgroup/systemd/release_agent is set?

The problem is that within the container the release agent is not executed.
It is executed on the host side.

Lennart, how is this supposed to work?
Is the theory of operation that the host systemd sends org.freedesktop.systemd1.Agent Released
via dbus into the guest?
The guests systemd definitely does not receive such a signal.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [systemd-devel] systemd-cgroups-agent not working in containers
       [not found] ` <54764639.3020100-/L3Ra7n9ekc@public.gmane.org>
                     ` (2 preceding siblings ...)
  2014-11-27 20:26   ` Cameron Norman
@ 2014-11-30 22:30   ` Lennart Poettering
  3 siblings, 0 replies; 8+ messages in thread
From: Lennart Poettering @ 2014-11-30 22:30 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Linux Containers,
	systemd-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org

On Wed, 26.11.14 22:29, Richard Weinberger (richard-/L3Ra7n9ekc@public.gmane.org) wrote:

> Hi!
> 
> I run a Linux container setup with openSUSE 13.1/2 as guest distro.
> After some time containers slow down.
> An investigation showed that the containers slow down because a lot of stale
> user sessions slow down almost all systemd tools, mostly systemctl.
> loginctl reports many thousand sessions.
> All in state "closing".
> 
> The vast majority of these sessions are from crond an ssh logins.
> It turned out that sessions are never closed and stay around.
> The control group of a said session contains zero tasks.
> So I started to explore why systemd keeps it.
> After another few hours of debugging I realized that systemd never
> issues the release signal from cgroups.
> Also calling the release agent by hand did not help. i.e.
> /usr/lib/systemd/systemd-cgroups-agent /user.slice/user-0.slice/session-c324.scope
> 
> Therefore systemd never recognizes that a server/session has no more tasks
> and will close it.
> First I thought it is an issue in libvirt combined with user namespaces.
> But I can trigger this also without user namespaces and also with systemd-nspawn.
> Tested with systemd 208 and 210 from openSUSE, their packages have all known bugfixes.
> 
> Any idea where to look further?

cgroup empty notification is seriously broken unfortunately in the
kernel the way it is currently implemented. And we'll miss the
callouts in a number of cases (for example, if somebody has any dir in
a cgroup still we get no events for it. It's also not available at all
inside of containers, since the callouts take place on the main pid
namespace, and nowhere else).

Our current strategy for still being able to clean everything up is
this:

a) for service units we keep track of main and control PID (control
   PID is the PID of any script or so we invoke to shutdown a service,
   via ExecStop= or so, or for reload via ExecReload, and so on) and
   if they are gone we consider the service dead, and kill all other
   processes of a service forcibly, not waiting for them between
   SIGTERM and SIGKILL, simply because we can't.

b) For scope units (which login sessions are exposed as) things are
   more difficult. While for service units the relevant processes are
   children of PID 1 and we hence get SIGCHLD signals for this is
   usually not the case for scope units, the processes might be child
   processes of arbitrary processes, we hence cannot reliably get
   notifications for. For dealing with this we have two strategies:

   [1] the registrar of the scope must explicitly stop the scope when
   appropriate.

   [2] the registrar of the scope must explicitly "abandon" the scope
   when appropriate.
   
   In the case of logind both stopping and abandoning are available,
   depending on the KillUserProcesses= setting of
   logind.conf. logind triggers the stopping/abandoning as soon as
   either:

   I)  the PAM session end hook is invoked for the specific session

   II) or the session fifo is closed. Each session logind keeps track
       of has one of these. The FIFO is simply created in the PAM open
       session hook, and normally closed in the session end
       hook. Should the session die abnormally though (without going
       through the PAM end hook) logind sees this as POLLHUP on the
       the other end of the FIFO and can act on it. (Note that the
       FIFO is passed with O_CLOEXEC to the PAM session to ensure that
       it only is kept around in the parent process between PAM open
       and end hooks, but not passed to the child processes, which
       then go an and invoke login/bash or whatever else that is the
       user session.

   When a scope is "stopped" this has the effect of killing all the
   scopes processes, immedietely. When it is "abandoned" however we
   iterate through all remaining processes of the scope, add them to a
   wacthlist and wait for a SIGCHLD for them, checking on each one we
   get if the scope is now empty. If it isn't empty then we collect
   the PIDs again at that time. The rationale for this is: the
   abandoning should normally happen when the main process of the
   scope dies. At this time the other processes of the scope (which
   are its children usually) would get reparented to PID 1 (because
   UNIX) which allows us to get SIGCHLD for them again.

Complex? Awful? Disgusting? Yes, absolutely. But as far as I can see
it should actually be good enough to all cases I ran into.

The proper fix in the long run is to get better notifications for
cgroups from the kernel. Great thing is, they are now available, but
only in the new "unified" cgroup hierarchy, which we haven't ported
things to yet. With that in place we finally can watch cgroups
comprehensively and safely without all this madness. Yay!

Now, if the tracking logic described above doesn't work for you, it
would be good if you would first try with pristine upstream systemd. 

In the past we had problems with PAM clients that didn't implement the
PAM session logic correctly and didn't invoke the PAM session close
hooks, didn't keep the parent process around to do so, or
suchlike. What kind of PAM session do you into this problem with?

> How do you run the most current systemd on your distro?

Well, I as a developer just build it from the git tree, after
installing all deps, with 

    ./autogen.sh c && make -j6 && sudo make install

Lennart

-- 
Lennart Poettering, Red Hat

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [systemd-devel] systemd-cgroups-agent not working in containers
       [not found]         ` <54788C12.2080907-/L3Ra7n9ekc@public.gmane.org>
@ 2014-11-30 22:31           ` Lennart Poettering
  0 siblings, 0 replies; 8+ messages in thread
From: Lennart Poettering @ 2014-11-30 22:31 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: libvir-list-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	Linux Containers, systemd-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Cameron Norman

On Fri, 28.11.14 15:52, Richard Weinberger (richard-/L3Ra7n9ekc@public.gmane.org) wrote:

> Am 28.11.2014 um 06:33 schrieb Martin Pitt:
> > Hello all,
> > 
> > Cameron Norman [2014-11-27 12:26 -0800]:
> >> On Wed, Nov 26, 2014 at 1:29 PM, Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org> wrote:
> >>> Hi!
> >>>
> >>> I run a Linux container setup with openSUSE 13.1/2 as guest distro.
> >>> After some time containers slow down.
> >>> An investigation showed that the containers slow down because a lot of stale
> >>> user sessions slow down almost all systemd tools, mostly systemctl.
> >>> loginctl reports many thousand sessions.
> >>> All in state "closing".
> >>
> >> This sounds similar to an issue that systemd-shim in Debian had.
> >> Martin Pitt (helps to maintain systemd in Debian) fixed that issue; he
> >> may have some ideas here. I CC'd him.
> > 
> > The problem with systemd-shim under sysvinit or upstart was that shim
> > didn't set a cgroup release agent like systemd itself does. Thus the
> > cgroups were never cleaned up after all the session processes died.
> > (See 1.4 on https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt
> > for details)
> > 
> > I don't think that SUSE uses systemd-shim, I take it in that setup you
> > are running systemd proper on both the host and the guest? Then I
> > suggest checking the cgroups that correspond to the "closing" sessions
> > in the container, i. e. /sys/fs/cgroup/systemd/.../session-XX.scope/tasks.
> > If there are still processes in it, logind is merely waiting for them
> > to exit (or set KillUserProcesses in logind.conf). If they are empty,
> > check that /sys/fs/cgroup/systemd/.../session-XX.scope/notify_on_release is 1
> > and that /sys/fs/cgroup/systemd/release_agent is set?
> 
> The problem is that within the container the release agent is not executed.
> It is executed on the host side.
> 
> Lennart, how is this supposed to work?
> Is the theory of operation that the host systemd sends org.freedesktop.systemd1.Agent Released
> via dbus into the guest?
> The guests systemd definitely does not receive such a signal.

No, the cgrouips agents are not reliable, because of subgroups, and
because of their incompatibility with containers. systemd uses the
events if it gets them, but we try hard to be able to live without
them (see other mail).

Lennart

-- 
Lennart Poettering, Red Hat

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-11-30 22:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-26 21:29 systemd-cgroups-agent not working in containers Richard Weinberger
     [not found] ` <54764639.3020100-/L3Ra7n9ekc@public.gmane.org>
2014-11-27 13:46   ` Richard Weinberger
2014-11-27 15:44   ` [systemd-devel] " Umut Tezduyar Lindskog
2014-11-27 20:26   ` Cameron Norman
     [not found]     ` <CALZWFRLVDW2fL_G0rC6mhMsBTPxsxYF1FATA1j-0unCS8WKAsg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-28  5:33       ` Martin Pitt
     [not found]     ` <20141128053302.GA2842@piware.de>
     [not found]       ` <20141128053302.GA2842-TX/5PCBRQDKzQB+pC5nmwQ@public.gmane.org>
2014-11-28 14:52         ` Richard Weinberger
     [not found]       ` <54788C12.2080907@nod.at>
     [not found]         ` <54788C12.2080907-/L3Ra7n9ekc@public.gmane.org>
2014-11-30 22:31           ` Lennart Poettering
2014-11-30 22:30   ` Lennart Poettering

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox