* ps stuck on cmdline reading
@ 2015-01-21 9:58 William Dauchy
[not found] ` <20150121095808.GA18656-M8Sm6a3kpgNeoWH0uzbU5w@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: William Dauchy @ 2015-01-21 9:58 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA
[-- Attachment #1: Type: text/plain, Size: 934 bytes --]
Hello,
I am sometines triggering an issue on a v3.14.x (v3.14.18 here) where
the `ps auxwwf` is stuck.
The setup is several containers which own several process and with
memory limit on each cgroup.
A strace reveals the `ps` command is stuck on a read of a cmdline
file. The concerned process itself is on a non-interruptible IO state.
The ps command is executed in the global cgroup.
I also had a similar issue on a 3.10.x some months ago where ps was
stuck; the reason was the memory limit of the cgroup was reached and I
only had to add some pages available to the conatiner in order to
unlock the ps command.
But in my case the cgroups which own the pid has still lots of memory
available and I did not found a way to unlock the process.
I don't know how to reproduce the issue but I am sometimes triggering
it.
Does someone has some hint? How can I get more debug info about it?
Thanks,
--
William
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ps stuck on cmdline reading
[not found] ` <20150121095808.GA18656-M8Sm6a3kpgNeoWH0uzbU5w@public.gmane.org>
@ 2015-02-03 19:40 ` Michal Hocko
[not found] ` <20150203194036.GA2490-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2015-02-03 19:40 UTC (permalink / raw)
To: William Dauchy; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA
On Wed 21-01-15 10:58:08, William Dauchy wrote:
> Hello,
>
> I am sometines triggering an issue on a v3.14.x (v3.14.18 here) where
> the `ps auxwwf` is stuck.
> The setup is several containers which own several process and with
> memory limit on each cgroup.
> A strace reveals the `ps` command is stuck on a read of a cmdline
> file. The concerned process itself is on a non-interruptible IO state.
> The ps command is executed in the global cgroup.
What does /proc/<ps pid>/stack say?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ps stuck on cmdline reading
[not found] ` <20150203194036.GA2490-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2015-02-05 15:22 ` William Dauchy
[not found] ` <20150205152233.GI3008-M8Sm6a3kpgNeoWH0uzbU5w@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: William Dauchy @ 2015-02-05 15:22 UTC (permalink / raw)
To: Michal Hocko; +Cc: William Dauchy, cgroups-u79uwXL29TY76Z2rM5mHXA
[-- Attachment #1: Type: text/plain, Size: 2049 bytes --]
On Feb03 20:40, Michal Hocko wrote:
> What does /proc/<ps pid>/stack say?
Thanks for the hint.
forgot to have a look; seems to be a nfs issue:
[<ffffffff810eda29>] sleep_on_page+0x9/0x20
[<ffffffff810ee8d4>] __lock_page+0xa4/0xb0
[<ffffffff810fe38f>] truncate_inode_pages_range+0x3ef/0x6a0
[<ffffffff810fe650>] truncate_inode_pages+0x10/0x20
[<ffffffff812474b6>] nfs4_evict_inode+0x16/0x40
[<ffffffff8117359f>] evict+0xaf/0x1c0
[<ffffffff811744c2>] iput+0x102/0x1a0
[<ffffffff812178d5>] nfs_dentry_iput+0x35/0x50
[<ffffffff8116f36c>] dentry_kill+0x16c/0x290
[<ffffffff8116f9d8>] dput+0xa8/0x160
[<ffffffff8121c96b>] __put_nfs_open_context+0xbb/0x100
[<ffffffff8121d1bb>] put_nfs_open_context+0xb/0x20
[<ffffffff81229ea4>] nfs_commitdata_release+0x14/0x30
[<ffffffff81229ee7>] nfs_commit_release+0x27/0x30
[<ffffffff815d23d2>] rpc_free_task+0x32/0x80
[<ffffffff815d24a5>] rpc_final_put_task+0x85/0x90
[<ffffffff815d2505>] rpc_do_put_task+0x35/0x40
[<ffffffff815d320b>] rpc_put_task+0xb/0x20
[<ffffffff81229fd0>] nfs_initiate_commit+0xe0/0x120
[<ffffffff8122a269>] nfs_commit_list+0x69/0xb0
[<ffffffff8122a36b>] nfs_commit_inode+0x9b/0x160
[<ffffffff8121b02b>] nfs_release_page+0x7b/0xa0
[<ffffffff810f18dd>] try_to_release_page+0x3d/0x60
[<ffffffff81100b06>] shrink_page_list+0x8a6/0x9e0
[<ffffffff81101233>] shrink_inactive_list+0x183/0x420
[<ffffffff81101b45>] shrink_lruvec+0x335/0x6f0
[<ffffffff81101f66>] shrink_zone+0x66/0x1a0
[<ffffffff8110217b>] do_try_to_free_pages+0xdb/0x550
[<ffffffff81102764>] try_to_free_mem_cgroup_pages+0xa4/0xb0
[<ffffffff8114851e>] mem_cgroup_reclaim+0x4e/0xd0
[<ffffffff81148e29>] __mem_cgroup_try_charge+0x409/0xbf0
[<ffffffff81149d85>] mem_cgroup_charge_common+0x45/0xa0
[<ffffffff8114c1d6>] mem_cgroup_newpage_charge+0x26/0x30
[<ffffffff811181a8>] handle_mm_fault+0x8c8/0xcf0
[<ffffffff810342a3>] __do_page_fault+0x1b3/0x600
[<ffffffff8103472c>] do_page_fault+0xc/0x20
[<ffffffff81609572>] page_fault+0x22/0x30
[<ffffffffffffffff>] 0xffffffffffffffff
--
William
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ps stuck on cmdline reading
[not found] ` <20150205152233.GI3008-M8Sm6a3kpgNeoWH0uzbU5w@public.gmane.org>
@ 2015-02-05 15:45 ` Michal Hocko
[not found] ` <20150205154558.GF19104-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2015-02-05 15:45 UTC (permalink / raw)
To: William Dauchy; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA
On Thu 05-02-15 16:22:33, William Dauchy wrote:
> On Feb03 20:40, Michal Hocko wrote:
> > What does /proc/<ps pid>/stack say?
>
> Thanks for the hint.
> forgot to have a look; seems to be a nfs issue:
>
> [<ffffffff810eda29>] sleep_on_page+0x9/0x20
> [<ffffffff810ee8d4>] __lock_page+0xa4/0xb0
> [<ffffffff810fe38f>] truncate_inode_pages_range+0x3ef/0x6a0
> [<ffffffff810fe650>] truncate_inode_pages+0x10/0x20
> [<ffffffff812474b6>] nfs4_evict_inode+0x16/0x40
> [<ffffffff8117359f>] evict+0xaf/0x1c0
> [<ffffffff811744c2>] iput+0x102/0x1a0
> [<ffffffff812178d5>] nfs_dentry_iput+0x35/0x50
OK, so the memcg reclaim gets down to nfs and it wants to evict its
inode and that is waiting for the page to get unlocked. It would be
interesting to find out who is keeping the page locked. What is the nfsd
doing?
> [<ffffffff8116f36c>] dentry_kill+0x16c/0x290
> [<ffffffff8116f9d8>] dput+0xa8/0x160
> [<ffffffff8121c96b>] __put_nfs_open_context+0xbb/0x100
> [<ffffffff8121d1bb>] put_nfs_open_context+0xb/0x20
> [<ffffffff81229ea4>] nfs_commitdata_release+0x14/0x30
> [<ffffffff81229ee7>] nfs_commit_release+0x27/0x30
> [<ffffffff815d23d2>] rpc_free_task+0x32/0x80
> [<ffffffff815d24a5>] rpc_final_put_task+0x85/0x90
> [<ffffffff815d2505>] rpc_do_put_task+0x35/0x40
> [<ffffffff815d320b>] rpc_put_task+0xb/0x20
> [<ffffffff81229fd0>] nfs_initiate_commit+0xe0/0x120
> [<ffffffff8122a269>] nfs_commit_list+0x69/0xb0
> [<ffffffff8122a36b>] nfs_commit_inode+0x9b/0x160
> [<ffffffff8121b02b>] nfs_release_page+0x7b/0xa0
> [<ffffffff810f18dd>] try_to_release_page+0x3d/0x60
> [<ffffffff81100b06>] shrink_page_list+0x8a6/0x9e0
> [<ffffffff81101233>] shrink_inactive_list+0x183/0x420
> [<ffffffff81101b45>] shrink_lruvec+0x335/0x6f0
> [<ffffffff81101f66>] shrink_zone+0x66/0x1a0
> [<ffffffff8110217b>] do_try_to_free_pages+0xdb/0x550
> [<ffffffff81102764>] try_to_free_mem_cgroup_pages+0xa4/0xb0
> [<ffffffff8114851e>] mem_cgroup_reclaim+0x4e/0xd0
> [<ffffffff81148e29>] __mem_cgroup_try_charge+0x409/0xbf0
> [<ffffffff81149d85>] mem_cgroup_charge_common+0x45/0xa0
> [<ffffffff8114c1d6>] mem_cgroup_newpage_charge+0x26/0x30
> [<ffffffff811181a8>] handle_mm_fault+0x8c8/0xcf0
> [<ffffffff810342a3>] __do_page_fault+0x1b3/0x600
> [<ffffffff8103472c>] do_page_fault+0xc/0x20
> [<ffffffff81609572>] page_fault+0x22/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
> --
> William
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ps stuck on cmdline reading
[not found] ` <20150205154558.GF19104-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
@ 2015-02-10 15:10 ` William Dauchy
0 siblings, 0 replies; 5+ messages in thread
From: William Dauchy @ 2015-02-10 15:10 UTC (permalink / raw)
To: Michal Hocko
Cc: William Dauchy, cgroups-u79uwXL29TY76Z2rM5mHXA, Trond Myklebust,
Anna Schumaker
[-- Attachment #1: Type: text/plain, Size: 463 bytes --]
On Feb05 16:45, Michal Hocko wrote:
> OK, so the memcg reclaim gets down to nfs and it wants to evict its
> inode and that is waiting for the page to get unlocked. It would be
> interesting to find out who is keeping the page locked. What is the nfsd
> doing?
Trond points me the commit
9590544 NFS: avoid deadlocks with loop-back mounted NFS filesystems.
but it relies on sched commit as well, it seems hard to be backported in
3.14.x
--
William
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-02-10 15:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-21 9:58 ps stuck on cmdline reading William Dauchy
[not found] ` <20150121095808.GA18656-M8Sm6a3kpgNeoWH0uzbU5w@public.gmane.org>
2015-02-03 19:40 ` Michal Hocko
[not found] ` <20150203194036.GA2490-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2015-02-05 15:22 ` William Dauchy
[not found] ` <20150205152233.GI3008-M8Sm6a3kpgNeoWH0uzbU5w@public.gmane.org>
2015-02-05 15:45 ` Michal Hocko
[not found] ` <20150205154558.GF19104-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2015-02-10 15:10 ` William Dauchy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox