kernel crash

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* kernel crash
@ 2011-09-22 18:45 M
  2011-09-22 21:24 ` Dave Jones
  0 siblings, 1 reply; 4+ messages in thread
From: M @ 2011-09-22 18:45 UTC (permalink / raw)
  To: linux-mm

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]

Hi,

I am running Fedora 15 644bit on AMD 64bit arch. After update 3 days ago, kernel started to crash when I submit a heavy computation job. It happened today also with similar type of job. 

I submitted a bug report to https://bugzilla.redhat.com/  d=740613 . They referred me to contact linux memory management group. I have also uploaded my log file in the bug report. I will be very happy to provide more information if required to resolve this issue.

Thanks.

[-- Attachment #2: Type: text/html, Size: 605 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kernel crash
  2011-09-22 18:45 kernel crash M
@ 2011-09-22 21:24 ` Dave Jones
  2011-09-22 21:38   ` David Rientjes
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Jones @ 2011-09-22 21:24 UTC (permalink / raw)
  To: M; +Cc: linux-mm

On Thu, Sep 22, 2011 at 11:45:25AM -0700, M wrote:
 > Hi,
 > 
 > I am running Fedora 15 644bit on AMD 64bit arch. After update 3 days ago, kernel started to crash when I submit a heavy computation job. It happened today also with similar type of job. 
 > 
 > I submitted a bug report to https://bugzilla.redhat.com/  d=740613 . They referred me to contact linux memory management group. I have also uploaded my log file in the bug report. I will be very happy to provide more information if required to resolve this issue.
 > 
 > Thanks.

(fixed url is https://bugzilla.redhat.com/show_bug.cgi?id=740613)

Manoj's report here has a system with 32GB of RAM and 40GB of swap
oomkill'ing processes when there seems to be ram still available.

I note the gfp mask of the failing allocations has GFP_HIGHMEM,
and this apparently doesn't happen when he runs 32-bit.

Could that be a clue ?

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kernel crash
  2011-09-22 21:24 ` Dave Jones
@ 2011-09-22 21:38   ` David Rientjes
  2011-09-22 21:53     ` Dave Jones
  0 siblings, 1 reply; 4+ messages in thread
From: David Rientjes @ 2011-09-22 21:38 UTC (permalink / raw)
  To: Dave Jones; +Cc: M, linux-mm

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4758 bytes --]

On Thu, 22 Sep 2011, Dave Jones wrote:

> On Thu, Sep 22, 2011 at 11:45:25AM -0700, M wrote:
>  > Hi,
>  > 
>  > I am running Fedora 15 644bit on AMD 64bit arch. After update 3 days ago, kernel started to crash when I submit a heavy computation job. It happened today also with similar type of job. 
>  > 
>  > I submitted a bug report to https://bugzilla.redhat.com/  d=740613 . They referred me to contact linux memory management group. I have also uploaded my log file in the bug report. I will be very happy to provide more information if required to resolve this issue.
>  > 
>  > Thanks.
> 
> (fixed url is https://bugzilla.redhat.com/show_bug.cgi?id=740613)
> 
> Manoj's report here has a system with 32GB of RAM and 40GB of swap
> oomkill'ing processes when there seems to be ram still available.
> 

Looking at the output of the first oom from 
https://bugzilla.redhat.com/attachment.cgi?id=524451

Sep 20 19:39:19 host2 kernel: [1932999.874704] Node 0 DMA free:15892kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Sep 20 19:39:19 host2 kernel: [1932999.874727] lowmem_reserve[]: 0 1970 16110 16110
Sep 20 19:39:19 host2 kernel: [1932999.874739] Node 0 DMA32 free:61832kB min:5500kB low:6872kB high:8248kB active_anon:1494456kB inactive_anon:498264kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2018144kB mlocked:0kB dirty:0kB writeback:0kB mapped:20kB shmem:0kB slab_reclaimable:864kB slab_unreclaimable:44kB kernel_stack:0kB pagetables:6632kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Sep 20 19:39:19 host2 kernel: [1932999.874765] lowmem_reserve[]: 0 0 14140 14140
Sep 20 19:39:19 host2 kernel: [1932999.874772] Node 0 Normal free:39440kB min:39464kB low:49328kB high:59196kB active_anon:12962768kB inactive_anon:1178596kB active_file:236kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:14479360kB mlocked:0kB dirty:0kB writeback:1560kB mapped:164kB shmem:0kB slab_reclaimable:10356kB slab_unreclaimable:12296kB kernel_stack:1504kB pagetables:75224kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:354 all_unreclaimable? yes
Sep 20 19:39:19 host2 kernel: [1932999.874810] lowmem_reserve[]: 0 0 0 0
Sep 20 19:39:19 host2 kernel: [1932999.874817] Node 1 Normal free:44920kB min:45100kB low:56372kB high:67648kB active_anon:14981988kB inactive_anon:1248872kB active_file:812kB inactive_file:792kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:16547840kB mlocked:0kB dirty:0kB writeback:14676kB mapped:392kB shmem:0kB slab_reclaimable:8484kB slab_unreclaimable:11252kB kernel_stack:840kB pagetables:71480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2057 all_unreclaimable? yes
Sep 20 19:39:19 host2 kernel: [1932999.874856] lowmem_reserve[]: 0 0 0 0

We can see that both normal zones are under their minimum watermark, so 
they are completely oom.  We can't allocate in ZONE_DMA32 because of 
lowmem_reserve for this gfp mask (61832K - (14140 * 4K) < 5500K) and 
ZONE_DMA for the same reason.  So there's no RAM available.

Sep 20 19:39:19 host2 kernel: [1932999.874970] 331623 total pagecache pages
Sep 20 19:39:19 host2 kernel: [1932999.874974] 331021 pages in swap cache
Sep 20 19:39:19 host2 kernel: [1932999.874978] Swap cache stats: add 10280280, delete 9949259, find 5232/9633
Sep 20 19:39:19 host2 kernel: [1932999.874982] Free swap  = 0kB

And there's no swap available.

> I note the gfp mask of the failing allocations has GFP_HIGHMEM,
> and this apparently doesn't happen when he runs 32-bit.
> 
> Could that be a clue ?
> 

The problem is this:

Sep 20 19:39:19 host2 kernel: [1933000.196980] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
...
Sep 20 19:39:19 host2 kernel: [1933000.197559] [13918]   507 13918 17992270  7758558   4     -17         -1000 root.exe

root.exe is has about 29.5GB of the 32GB available memory in RAM, and it's 
set to have a /proc/13918/oom_score_adj of -1000 meaning it's not eligible 
for oom killing.  So the kernel panics rather than kill the task.

There's not much the kernel can be expected to do in such a configuration, 
you've simply exhausted all RAM and swap.  You can set 
/proc/pid/oom_score_adj to not be -1000 so that it is at least eligible to 
be killed in these circumstances rather than panic the machine, but the VM 
will continue to oom under this configuration.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kernel crash
  2011-09-22 21:38   ` David Rientjes
@ 2011-09-22 21:53     ` Dave Jones
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Jones @ 2011-09-22 21:53 UTC (permalink / raw)
  To: David Rientjes; +Cc: M, linux-mm

On Thu, Sep 22, 2011 at 02:38:47PM -0700, David Rientjes wrote:

 > The problem is this:
 > 
 > Sep 20 19:39:19 host2 kernel: [1933000.196980] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
 > ...
 > Sep 20 19:39:19 host2 kernel: [1933000.197559] [13918]   507 13918 17992270  7758558   4     -17         -1000 root.exe
 > 
 > root.exe is has about 29.5GB of the 32GB available memory in RAM, and it's 
 > set to have a /proc/13918/oom_score_adj of -1000 meaning it's not eligible 
 > for oom killing.  So the kernel panics rather than kill the task.
 > 
 > There's not much the kernel can be expected to do in such a configuration, 
 > you've simply exhausted all RAM and swap.  You can set 
 > /proc/pid/oom_score_adj to not be -1000 so that it is at least eligible to 
 > be killed in these circumstances rather than panic the machine, but the VM 
 > will continue to oom under this configuration.

It's surprising that the same workload in 32-bit works.

Manoj, is root.exe recompiled for 64-bit ? I'm wondering if it's just that
the expansion of a lot of unsigned longs are causing increased memory use vs
the original 32bit use-case.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-09-22 21:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-22 18:45 kernel crash M
2011-09-22 21:24 ` Dave Jones
2011-09-22 21:38   ` David Rientjes
2011-09-22 21:53     ` Dave Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).