All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
To: Daniel Lezcano <dlezcano-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
Cc: Linux Containers
	<containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>,
	Linux Kernel Mailing List
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Pavel Emelianov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
Subject: Re: pidns memory leak
Date: Thu, 8 Oct 2009 20:29:28 -0700	[thread overview]
Message-ID: <20091009032928.GA2031@us.ibm.com> (raw)
In-Reply-To: <4ACD9ECC.90508-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>

Daniel Lezcano [dlezcano-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org] wrote:
> Sukadev Bhattiprolu wrote:
>> Still digging through some traces, but below I have some questions that 
>> I am still trying to answer.
>>
>>> I am not sure what you mean by 'struct pids' but what I observed is:
>>
>> Ok, I see that too. If pids leak, then pid-namespace will leak too.
>> Do you see any leaks in proc_inode_cache ?
>
> Yes, right. It leaks too.

Ok, some progress...

Can you please verify these observations:

- If the container exits normally, the leak does not seem to happen.
  (i.e reduce your sleep 3600 to say sleep 3 and remove the lxc-stop).

- Revert the following commit and check if the leak happens:

	commit 7766755a2f249e7e0dabc5255a0a3d151ff79821
	Author: Andrea Arcangeli <andrea-l3A5Bk7waGM@public.gmane.org>
	Date:   Mon Feb 4 22:29:21 2008 -0800

(this commit added the check for PF_EXITING in proc_flush_task_mnt 
loosely explained below).

Incomplete analysis :-)

If the container-init is terminated (by the lxc-stop), the container zaps
other processes in the container and waits for them. The leak happens in
this case.

Following sequence of events occur:

	- container-init calls do_exit and sets PF_EXITING (in exit_signals())

	- container init calls zaps_pid_ns_processes() (exit_notify /
	  forget_orignal_parent() / find_new_reaper())

	- In zap_pid_ns_processes() container-init sends SIGKILL to
	  descendants and calls sys_wait().

	- The sys_wait() is expected to call release_task() which calls
	  proc_flush_task_mnt().

	- proc_flush_task_mnt() looks up the dentry for the pid (2 in
	  our example) and finds the dentry.

	  But since container-init is itself exiting (i.e PF_EXITING is
	  set) it does NOT call the shrink_dcache_parent(), but,
	  interestingly calls d_drop() and dput().

	  Now the d_drop() unhashes the dentry for the pid 2.

	- proc_flush_task_mnt() then tries to find the dentry for the
	  tgid of the process. In our case, the tgid == pid == 2 and
	  we just unhashed the dentry for "2".

	  So, we don't find the dentry for the leader either (and hence
	  don't make the second shrink_dcache_parent() call in
	  proc_flush_task_mnt() either).

	  Without a call to shrink_dcache_parent(), the proc inode
	  for the process that was terminated by container init is
	  not deleted (i.e we don't call proc_delete_inode() or
	  the put_pid() inside it) causing us to leak proc_inodes,
	  struct pid and hence struct pid_namespace.

There should be a better fix, but first please confirm if reverting the
above commit fixes the leak for you also.

Sukadev

WARNING: multiple messages have this Message-ID (diff)
From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
To: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: Pavel Emelianov <xemul@openvz.org>,
	Sukadev Bhattiprolu <sukadev@us.ibm.com>,
	Linux Containers <containers@lists.osdl.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: pidns memory leak
Date: Thu, 8 Oct 2009 20:29:28 -0700	[thread overview]
Message-ID: <20091009032928.GA2031@us.ibm.com> (raw)
In-Reply-To: <4ACD9ECC.90508@fr.ibm.com>

Daniel Lezcano [dlezcano@fr.ibm.com] wrote:
> Sukadev Bhattiprolu wrote:
>> Still digging through some traces, but below I have some questions that 
>> I am still trying to answer.
>>
>>> I am not sure what you mean by 'struct pids' but what I observed is:
>>
>> Ok, I see that too. If pids leak, then pid-namespace will leak too.
>> Do you see any leaks in proc_inode_cache ?
>
> Yes, right. It leaks too.

Ok, some progress...

Can you please verify these observations:

- If the container exits normally, the leak does not seem to happen.
  (i.e reduce your sleep 3600 to say sleep 3 and remove the lxc-stop).

- Revert the following commit and check if the leak happens:

	commit 7766755a2f249e7e0dabc5255a0a3d151ff79821
	Author: Andrea Arcangeli <andrea@suse.de>
	Date:   Mon Feb 4 22:29:21 2008 -0800

(this commit added the check for PF_EXITING in proc_flush_task_mnt 
loosely explained below).

Incomplete analysis :-)

If the container-init is terminated (by the lxc-stop), the container zaps
other processes in the container and waits for them. The leak happens in
this case.

Following sequence of events occur:

	- container-init calls do_exit and sets PF_EXITING (in exit_signals())

	- container init calls zaps_pid_ns_processes() (exit_notify /
	  forget_orignal_parent() / find_new_reaper())

	- In zap_pid_ns_processes() container-init sends SIGKILL to
	  descendants and calls sys_wait().

	- The sys_wait() is expected to call release_task() which calls
	  proc_flush_task_mnt().

	- proc_flush_task_mnt() looks up the dentry for the pid (2 in
	  our example) and finds the dentry.

	  But since container-init is itself exiting (i.e PF_EXITING is
	  set) it does NOT call the shrink_dcache_parent(), but,
	  interestingly calls d_drop() and dput().

	  Now the d_drop() unhashes the dentry for the pid 2.

	- proc_flush_task_mnt() then tries to find the dentry for the
	  tgid of the process. In our case, the tgid == pid == 2 and
	  we just unhashed the dentry for "2".

	  So, we don't find the dentry for the leader either (and hence
	  don't make the second shrink_dcache_parent() call in
	  proc_flush_task_mnt() either).

	  Without a call to shrink_dcache_parent(), the proc inode
	  for the process that was terminated by container init is
	  not deleted (i.e we don't call proc_delete_inode() or
	  the put_pid() inside it) causing us to leak proc_inodes,
	  struct pid and hence struct pid_namespace.

There should be a better fix, but first please confirm if reverting the
above commit fixes the leak for you also.

Sukadev

  parent reply	other threads:[~2009-10-09  3:29 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-02 12:27 pidns memory leak Daniel Lezcano
2009-10-02 12:27 ` Daniel Lezcano
     [not found] ` <4AC5F198.2070407-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2009-10-06  4:05   ` Sukadev Bhattiprolu
2009-10-06  4:05     ` Sukadev Bhattiprolu
     [not found]     ` <20091006040526.GA22923-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-06  8:18       ` Daniel Lezcano
2009-10-06  8:18         ` Daniel Lezcano
     [not found]         ` <4ACAFD6A.3060008-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2009-10-08  3:08           ` Sukadev Bhattiprolu
2009-10-08  3:08             ` Sukadev Bhattiprolu
     [not found]             ` <20091008030828.GA18973-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-08  8:11               ` Daniel Lezcano
2009-10-08  8:11                 ` Daniel Lezcano
     [not found]                 ` <4ACD9ECC.90508-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2009-10-09  3:29                   ` Sukadev Bhattiprolu [this message]
2009-10-09  3:29                     ` Sukadev Bhattiprolu
     [not found]                     ` <20091009032928.GA2031-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-09 13:18                       ` Daniel Lezcano
2009-10-09 13:18                         ` Daniel Lezcano
     [not found]                         ` <4ACF381F.9050808-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2009-10-09 20:38                           ` Sukadev Bhattiprolu
2009-10-09 20:38                             ` Sukadev Bhattiprolu
     [not found]                             ` <20091009203809.GA12230-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-09 20:50                               ` Eric W. Biederman
2009-10-09 20:50                                 ` Eric W. Biederman
     [not found]                                 ` <m18wfkl2bf.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-10-10  1:58                                   ` Sukadev Bhattiprolu
2009-10-10  1:58                                     ` Sukadev Bhattiprolu
     [not found]                                     ` <20091010015859.GB11904-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-10  2:08                                       ` Eric W. Biederman
2009-10-10  2:08                                         ` Eric W. Biederman
2009-10-09 21:54                           ` Matt Helsley
2009-10-09 21:54                             ` Matt Helsley
2009-10-10  1:32                           ` Sukadev Bhattiprolu
2009-10-10  1:32                             ` Sukadev Bhattiprolu
     [not found]                             ` <20091010013235.GA11904-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-12  8:41                               ` Daniel Lezcano
2009-10-12  8:41                                 ` Daniel Lezcano
     [not found]                                 ` <4AD2EBC7.2020109-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2009-10-14  6:15                                   ` Sukadev Bhattiprolu
2009-10-14  6:15                                     ` Sukadev Bhattiprolu
     [not found]                                     ` <20091014061533.GA23569-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-11-02 21:33                                       ` Andrew Morton
2009-11-02 21:33                                         ` Andrew Morton
2009-11-02 22:38                                         ` Serge E. Hallyn
2009-11-02 22:47                                           ` Andrew Morton
2009-11-03  7:24                                             ` Cedric Le Goater
2009-11-03  8:41                                               ` Eric W. Biederman
2009-11-03  9:24                                                 ` Cedric Le Goater

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091009032928.GA2031@us.ibm.com \
    --to=sukadev-23vcf4htsmix0ybbhkvfkdbpr1lh4cv8@public.gmane.org \
    --cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
    --cc=dlezcano-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.