Sukadev Bhattiprolu wrote: > Daniel Lezcano [dlezcano-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org] wrote: >> Sukadev Bhattiprolu wrote: >>> Still digging through some traces, but below I have some questions that >>> I am still trying to answer. >>> >>>> I am not sure what you mean by 'struct pids' but what I observed is: >>> Ok, I see that too. If pids leak, then pid-namespace will leak too. >>> Do you see any leaks in proc_inode_cache ? >> Yes, right. It leaks too. > > Ok, some progress... > > Can you please verify these observations: > > - If the container exits normally, the leak does not seem to happen. > (i.e reduce your sleep 3600 to say sleep 3 and remove the lxc-stop). > > - Revert the following commit and check if the leak happens: > > commit 7766755a2f249e7e0dabc5255a0a3d151ff79821 > Author: Andrea Arcangeli > Date: Mon Feb 4 22:29:21 2008 -0800 > > (this commit added the check for PF_EXITING in proc_flush_task_mnt > loosely explained below). > Incomplete analysis :-) > > If the container-init is terminated (by the lxc-stop), the container zaps > other processes in the container and waits for them. The leak happens in > this case. > > Following sequence of events occur: > > - container-init calls do_exit and sets PF_EXITING (in exit_signals()) > > - container init calls zaps_pid_ns_processes() (exit_notify / > forget_orignal_parent() / find_new_reaper()) > > - In zap_pid_ns_processes() container-init sends SIGKILL to > descendants and calls sys_wait(). > > - The sys_wait() is expected to call release_task() which calls > proc_flush_task_mnt(). > > - proc_flush_task_mnt() looks up the dentry for the pid (2 in > our example) and finds the dentry. > > But since container-init is itself exiting (i.e PF_EXITING is > set) it does NOT call the shrink_dcache_parent(), but, > interestingly calls d_drop() and dput(). > > Now the d_drop() unhashes the dentry for the pid 2. > > - proc_flush_task_mnt() then tries to find the dentry for the > tgid of the process. In our case, the tgid == pid == 2 and > we just unhashed the dentry for "2". > > So, we don't find the dentry for the leader either (and hence > don't make the second shrink_dcache_parent() call in > proc_flush_task_mnt() either). > > Without a call to shrink_dcache_parent(), the proc inode > for the process that was terminated by container init is > not deleted (i.e we don't call proc_delete_inode() or > the put_pid() inside it) causing us to leak proc_inodes, > struct pid and hence struct pid_namespace. Ouch ! Nice analysis :) Following your explanation I was able to reproduce a simple program added in attachment. But there is something I do not understand is why the leak does not appear if I do the 'lstat' (cf. test program) in the pid 2 context. > There should be a better fix, but first please confirm if reverting the > above commit fixes the leak for you also. I confirm the leak does no longer appear when reverting this patch. Thanks -- Daniel