linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* ps lockups, cgroup memory reclaim
@ 2013-09-17 15:50 Mark Hills
  2013-09-17 16:28 ` Johannes Weiner
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Hills @ 2013-09-17 15:50 UTC (permalink / raw)
  To: linux-mm

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1497 bytes --]

I'm investigating intermitten kernel lockups in an HPC environment, with 
the RedHat kernel.

The symptoms are seen as lockups of multiple ps commands, with one 
consuming full CPU:

  # ps aux | grep ps
  root     19557 68.9  0.0 108100   908 ?        D    Sep16 1045:37 ps --ppid 1 -o args=
  root     19871  0.0  0.0 108100   908 ?        D    Sep16   0:00 ps --ppid 1 -o args=

SIGKILL on the busy one causes the other ps processes to run to completion 
(TERM has no effect).

In this case I was able to run my own ps to see the process list, but not 
always.

perf shows the locality of the spinning, roughly:

  proc_pid_cmdline
  get_user_pages
  handle_mm_fault
  mem_cgroup_try_charge_swapin
  mem_cgroup_reclaim

There are two entry points, the codepaths taken are better shown by the 
attached profile of CPU time.

We've had this behaviour since switching to Scientific Linux 6 (based on 
RHEL6, like CentOS) at kernel 2.6.32-279.9.1.el6.x86_64.

The example above is kernel 2.6.32-358.el6.x86_64.

I haven't been able to get a re-producable case with which to test the 
mainline kernel; our large-scale automated use of ps is working as a 
fuzz-test and switching kernels like that is not an option unfortunately.

Does this issue sound familiar? I'd appreciate any advice or information, 
or pointers to the mainline where such cases have been investigated.

I could not find anything using Google, but this problem does not have an 
key word or error message.

Many thanks

-- 
Mark

[-- Attachment #2: Type: APPLICATION/PDF, Size: 31913 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-10-24 17:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-17 15:50 ps lockups, cgroup memory reclaim Mark Hills
2013-09-17 16:28 ` Johannes Weiner
2013-09-18  0:50   ` Mark Hills
2013-10-24 17:39     ` Mark Hills

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).