* [RFC] Reproducible OOM with partial workaround
@ 2013-01-10 21:58 paul.szabo
2013-01-10 23:12 ` Dave Hansen
0 siblings, 1 reply; 11+ messages in thread
From: paul.szabo @ 2013-01-10 21:58 UTC (permalink / raw)
To: linux-mm; +Cc: 695182, linux-kernel
Dear Linux-MM,
On a machine with i386 kernel and over 32GB RAM, an OOM condition is
reliably obtained simply by writing a few files to some local disk
e.g. with:
n=0; while [ $n -lt 99 ]; do dd bs=1M count=1024 if=/dev/zero of=x$n; ((n=$n+1)); done
Crash usually occurs after 16 or 32 files written. Seems that the
problem may be avoided by using mem=32G on the kernel boot, and that
it occurs with any amount of RAM over 32GB.
I developed a workaround patch for this particular OOM demo, dropping
filesystem caches when about to exhaust lowmem. However, subsequently
I observed OOM when running many processes (as yet I do not have an
easy-to-reproduce demo of this); so as I suspected, the essence of the
problem is not with FS caches.
Could you please help in finding the cause of this OOM bug?
Please see
http://bugs.debian.org/695182
for details, in particular my workaround patch
http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=101;att=1;bug=695182
(Please reply to me directly, as I am not a subscriber to the linux-mm
mailing list.)
Thanks, Paul
Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics University of Sydney Australia
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Reproducible OOM with partial workaround
2013-01-10 21:58 [RFC] Reproducible OOM with partial workaround paul.szabo
@ 2013-01-10 23:12 ` Dave Hansen
2013-01-11 0:46 ` paul.szabo
0 siblings, 1 reply; 11+ messages in thread
From: Dave Hansen @ 2013-01-10 23:12 UTC (permalink / raw)
To: paul.szabo; +Cc: linux-mm, 695182, linux-kernel
On 01/10/2013 01:58 PM, paul.szabo@sydney.edu.au wrote:
> I developed a workaround patch for this particular OOM demo, dropping
> filesystem caches when about to exhaust lowmem. However, subsequently
> I observed OOM when running many processes (as yet I do not have an
> easy-to-reproduce demo of this); so as I suspected, the essence of the
> problem is not with FS caches.
>
> Could you please help in finding the cause of this OOM bug?
As was mentioned in the bug, your 32GB of physical memory only ends up
giving ~900MB of low memory to the kernel. Of that, around 600MB is
used for "mem_map[]", leaving only about 300MB available to the kernel
for *ALL* of its allocations at runtime.
Your configuration has never worked. This isn't a regression, it's
simply something that we know never worked in Linux and it's a very hard
problem to solve. One Linux vendor (at least) went to a huge amount of
trouble to develop, ship, and supported a kernel that supported large
32-bit machines, but it was never merged upstream and work stopped on it
when such machines became rare beasts:
http://lwn.net/Articles/39925/
I believe just about any Linux vendor would call your configuration
"unsupported". Just because the kernel can boot does not mean that we
expect it to work.
It's possible that some tweaks of the vm knobs (like lowmem_reserve)
could help you here. But, really, you don't want to run a 32-bit kernel
on such a large machine. Very, very few folks are running 32-bit
kernels on these systems and you're likely to keep running in to bugs
because this is such a rare configuration.
We've been very careful to ensure that 64-bit kernels shoul basically be
drop-in replacements for 32-bit ones. You can keep userspace 100%
32-bit, and just have a 64-bit kernel.
If you're really set on staying 32-bit, I might have a NUMA-Q I can give
you. ;)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Reproducible OOM with partial workaround
2013-01-10 23:12 ` Dave Hansen
@ 2013-01-11 0:46 ` paul.szabo
2013-01-11 1:26 ` Dave Hansen
0 siblings, 1 reply; 11+ messages in thread
From: paul.szabo @ 2013-01-11 0:46 UTC (permalink / raw)
To: dave; +Cc: 695182, linux-kernel, linux-mm
Dear Dave,
> Your configuration has never worked. This isn't a regression ...
> ... does not mean that we expect it to work.
Do you mean that CONFIG_HIGHMEM64G is deprecated, should not be used;
that all development is for 64-bit only?
> ... 64-bit kernels should basically be drop-in replacements ...
Will think about that. I know all my servers are 64-bit capable, will
need to check all my desktops.
---
I find it puzzling that there seems to be a sharp cutoff at 32GB RAM,
no problem under but OOM just over; whereas I would have expected
lowmem starvation to be gradual, with OOM occuring much sooner with
64GB than with 34GB. Also, the kernel seems capable of reclaiming
lowmem, so I wonder why does that fail just over the 32GB threshhold.
(Obviously I have no idea what I am talking about.)
---
Thanks, Paul
Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics University of Sydney Australia
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Reproducible OOM with partial workaround
2013-01-11 0:46 ` paul.szabo
@ 2013-01-11 1:26 ` Dave Hansen
2013-01-11 1:46 ` paul.szabo
0 siblings, 1 reply; 11+ messages in thread
From: Dave Hansen @ 2013-01-11 1:26 UTC (permalink / raw)
To: paul.szabo; +Cc: 695182, linux-kernel, linux-mm
On 01/10/2013 04:46 PM, paul.szabo@sydney.edu.au wrote:
>> Your configuration has never worked. This isn't a regression ...
>> ... does not mean that we expect it to work.
>
> Do you mean that CONFIG_HIGHMEM64G is deprecated, should not be used;
> that all development is for 64-bit only?
My last 4GB laptop had a 1GB hole and needed HIGHMEM64G since it had RAM
at 0->5GB. That worked just fine, btw. The problem isn't with
HIGHMEM64G itself.
I'm not saying HIGHMEM64G is inherently bad, just that it gets gradually
worse and worse as you add more RAM. I don't believe 64GB of RAM has
_ever_ been booted on a 32-bit kernel without either violating the ABI
(3GB/1GB split) or doing something that never got merged upstream (that
4GB/4GB split, or other fun stuff like page clustering).
> I find it puzzling that there seems to be a sharp cutoff at 32GB RAM,
> no problem under but OOM just over; whereas I would have expected
> lowmem starvation to be gradual, with OOM occuring much sooner with
> 64GB than with 34GB. Also, the kernel seems capable of reclaiming
> lowmem, so I wonder why does that fail just over the 32GB threshhold.
> (Obviously I have no idea what I am talking about.)
It _is_ puzzling. It isn't immediately obvious to me why the slab that
you have isn't being reclaimed. There might, indeed, be a fixable bug
there. But, there are probably a bunch more bugs which will keep you
from having a nice, smoothly-running system, mostly those bugs have not
had much attention in the 10 years or so since 64-bit x86 became
commonplace. Plus, even 10 years ago, when folks were working on this
actively, we _never_ got things running smoothly on 32GB of RAM. Take a
look at this:
http://support.bull.com/ols/product/system/linux/redhat/help/kbf/g/inst/PrKB11417
You are effectively running the "SMP kernel" (hugemem is a completely
different beast).
I had a 32GB i386 system. It was a really, really fun system to play
with, and its never-ending list of bugs helped keep me employed for
several years. You don't want to unnecessarily inflict that pain on
yourself, really.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Reproducible OOM with partial workaround
2013-01-11 1:26 ` Dave Hansen
@ 2013-01-11 1:46 ` paul.szabo
2013-01-11 8:01 ` Andrew Morton
2013-01-11 16:04 ` Dave Hansen
0 siblings, 2 replies; 11+ messages in thread
From: paul.szabo @ 2013-01-11 1:46 UTC (permalink / raw)
To: dave; +Cc: 695182, linux-kernel, linux-mm
Dear Dave,
> ... I don't believe 64GB of RAM has _ever_ been booted on a 32-bit
> kernel without either violating the ABI (3GB/1GB split) or doing
> something that never got merged upstream ...
Sorry to be so contradictory:
psz@como:~$ uname -a
Linux como.maths.usyd.edu.au 3.2.32-pk06.10-t01-i386 #1 SMP Sat Jan 5 18:34:25 EST 2013 i686 GNU/Linux
psz@como:~$ free -l
total used free shared buffers cached
Mem: 64446900 4729292 59717608 0 15972 480520
Low: 375836 304400 71436
High: 64071064 4424892 59646172
-/+ buffers/cache: 4232800 60214100
Swap: 134217724 0 134217724
psz@como:~$
(though I would not know about violations).
But OK, I take your point that I should move with the times.
Cheers, Paul
Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics University of Sydney Australia
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Reproducible OOM with partial workaround
2013-01-11 1:46 ` paul.szabo
@ 2013-01-11 8:01 ` Andrew Morton
2013-01-11 8:30 ` Simon Jeons
2013-01-11 11:51 ` paul.szabo
2013-01-11 16:04 ` Dave Hansen
1 sibling, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2013-01-11 8:01 UTC (permalink / raw)
To: paul.szabo; +Cc: dave, 695182, linux-kernel, linux-mm
On Fri, 11 Jan 2013 12:46:15 +1100 paul.szabo@sydney.edu.au wrote:
> > ... I don't believe 64GB of RAM has _ever_ been booted on a 32-bit
> > kernel without either violating the ABI (3GB/1GB split) or doing
> > something that never got merged upstream ...
>
> Sorry to be so contradictory:
>
> psz@como:~$ uname -a
> Linux como.maths.usyd.edu.au 3.2.32-pk06.10-t01-i386 #1 SMP Sat Jan 5 18:34:25 EST 2013 i686 GNU/Linux
> psz@como:~$ free -l
> total used free shared buffers cached
> Mem: 64446900 4729292 59717608 0 15972 480520
> Low: 375836 304400 71436
> High: 64071064 4424892 59646172
> -/+ buffers/cache: 4232800 60214100
> Swap: 134217724 0 134217724
> psz@como:~$
>
> (though I would not know about violations).
>
> But OK, I take your point that I should move with the times.
Check /proc/slabinfo, see if all your lowmem got eaten up by buffer_heads.
If so, you *may* be able to work around this by setting
/proc/sys/vm/dirty_ratio really low, so the system keeps a minimum
amount of dirty pagecache around. Then, with luck, if we haven't
broken the buffer_heads_over_limit logic it in the past decade (we
probably have), the VM should be able to reclaim those buffer_heads.
Alternatively, use a filesystem which doesn't attach buffer_heads to
dirty pages. xfs or btrfs, perhaps.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Reproducible OOM with partial workaround
2013-01-11 8:01 ` Andrew Morton
@ 2013-01-11 8:30 ` Simon Jeons
2013-01-11 11:51 ` paul.szabo
1 sibling, 0 replies; 11+ messages in thread
From: Simon Jeons @ 2013-01-11 8:30 UTC (permalink / raw)
To: Andrew Morton; +Cc: paul.szabo, dave, 695182, linux-kernel, linux-mm
On Fri, 2013-01-11 at 00:01 -0800, Andrew Morton wrote:
> On Fri, 11 Jan 2013 12:46:15 +1100 paul.szabo@sydney.edu.au wrote:
>
> > > ... I don't believe 64GB of RAM has _ever_ been booted on a 32-bit
> > > kernel without either violating the ABI (3GB/1GB split) or doing
> > > something that never got merged upstream ...
> >
> > Sorry to be so contradictory:
> >
> > psz@como:~$ uname -a
> > Linux como.maths.usyd.edu.au 3.2.32-pk06.10-t01-i386 #1 SMP Sat Jan 5 18:34:25 EST 2013 i686 GNU/Linux
> > psz@como:~$ free -l
> > total used free shared buffers cached
> > Mem: 64446900 4729292 59717608 0 15972 480520
> > Low: 375836 304400 71436
> > High: 64071064 4424892 59646172
> > -/+ buffers/cache: 4232800 60214100
> > Swap: 134217724 0 134217724
> > psz@como:~$
> >
> > (though I would not know about violations).
> >
> > But OK, I take your point that I should move with the times.
>
> Check /proc/slabinfo, see if all your lowmem got eaten up by buffer_heads.
>
> If so, you *may* be able to work around this by setting
> /proc/sys/vm/dirty_ratio really low, so the system keeps a minimum
> amount of dirty pagecache around. Then, with luck, if we haven't
> broken the buffer_heads_over_limit logic it in the past decade (we
> probably have), the VM should be able to reclaim those buffer_heads.
>
> Alternatively, use a filesystem which doesn't attach buffer_heads to
> dirty pages. xfs or btrfs, perhaps.
>
Hi Andrew,
What's the meaning of attaching buffer_heads to dirty pages?
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Reproducible OOM with partial workaround
2013-01-11 8:01 ` Andrew Morton
2013-01-11 8:30 ` Simon Jeons
@ 2013-01-11 11:51 ` paul.szabo
2013-01-11 20:31 ` Andrew Morton
1 sibling, 1 reply; 11+ messages in thread
From: paul.szabo @ 2013-01-11 11:51 UTC (permalink / raw)
To: akpm; +Cc: 695182, dave, linux-kernel, linux-mm
Dear Andrew,
> Check /proc/slabinfo, see if all your lowmem got eaten up by buffer_heads.
Please see below: I do not know what any of that means. This machine has
been running just fine, with all my users logging in here via XDMCP from
X-terminals, dozens logged in simultaneously. (But, I think I could make
it go OOM with more processes or logins.)
> If so, you *may* be able to work around this by setting
> /proc/sys/vm/dirty_ratio really low, so the system keeps a minimum
> amount of dirty pagecache around. Then, with luck, if we haven't
> broken the buffer_heads_over_limit logic it in the past decade (we
> probably have), the VM should be able to reclaim those buffer_heads.
I tried setting dirty_ratio to "funny" values, that did not seem to
help. Did you notice my patch about bdi_position_ratio(), how it was
plain wrong half the time (for negative x)? Anyway that did not help.
> Alternatively, use a filesystem which doesn't attach buffer_heads to
> dirty pages. xfs or btrfs, perhaps.
Seems there is also a problem not related to filesystem... or rather,
the essence does not seem to be filesystem or caches. The filesystem
thing now seems OK with my patch doing drop_caches.
Cheers, Paul
Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics University of Sydney Australia
---
root@como:~# free -lm
total used free shared buffers cached
Mem: 62936 2317 60618 0 41 635
Low: 367 271 95
High: 62569 2045 60523
-/+ buffers/cache: 1640 61295
Swap: 131071 0 131071
root@como:~# cat /proc/slabinfo
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
fuse_request 0 0 376 43 4 : tunables 0 0 0 : slabdata 0 0 0
fuse_inode 0 0 448 36 4 : tunables 0 0 0 : slabdata 0 0 0
bsg_cmd 0 0 288 28 2 : tunables 0 0 0 : slabdata 0 0 0
ntfs_big_inode_cache 0 0 512 32 4 : tunables 0 0 0 : slabdata 0 0 0
ntfs_inode_cache 0 0 176 46 2 : tunables 0 0 0 : slabdata 0 0 0
nfs_direct_cache 0 0 80 51 1 : tunables 0 0 0 : slabdata 0 0 0
nfs_inode_cache 5404 5404 584 28 4 : tunables 0 0 0 : slabdata 193 193 0
isofs_inode_cache 0 0 360 45 4 : tunables 0 0 0 : slabdata 0 0 0
fat_inode_cache 0 0 408 40 4 : tunables 0 0 0 : slabdata 0 0 0
fat_cache 0 0 24 170 1 : tunables 0 0 0 : slabdata 0 0 0
jbd2_revoke_record 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0
journal_handle 5440 5440 24 170 1 : tunables 0 0 0 : slabdata 32 32 0
journal_head 16768 16768 64 64 1 : tunables 0 0 0 : slabdata 262 262 0
revoke_record 20224 20224 16 256 1 : tunables 0 0 0 : slabdata 79 79 0
ext4_inode_cache 0 0 584 28 4 : tunables 0 0 0 : slabdata 0 0 0
ext4_free_data 0 0 40 102 1 : tunables 0 0 0 : slabdata 0 0 0
ext4_allocation_context 0 0 112 36 1 : tunables 0 0 0 : slabdata 0 0 0
ext4_prealloc_space 0 0 72 56 1 : tunables 0 0 0 : slabdata 0 0 0
ext4_io_end 0 0 576 28 4 : tunables 0 0 0 : slabdata 0 0 0
ext4_io_page 0 0 8 512 1 : tunables 0 0 0 : slabdata 0 0 0
ext2_inode_cache 0 0 480 34 4 : tunables 0 0 0 : slabdata 0 0 0
ext3_inode_cache 16531 19965 488 33 4 : tunables 0 0 0 : slabdata 605 605 0
ext3_xattr 0 0 48 85 1 : tunables 0 0 0 : slabdata 0 0 0
dquot 840 840 192 42 2 : tunables 0 0 0 : slabdata 20 20 0
rpc_inode_cache 144 144 448 36 4 : tunables 0 0 0 : slabdata 4 4 0
UDP-Lite 0 0 576 28 4 : tunables 0 0 0 : slabdata 0 0 0
xfrm_dst_cache 0 0 320 51 4 : tunables 0 0 0 : slabdata 0 0 0
UDP 896 896 576 28 4 : tunables 0 0 0 : slabdata 32 32 0
tw_sock_TCP 1344 1344 128 32 1 : tunables 0 0 0 : slabdata 42 42 0
TCP 1457 1624 1152 28 8 : tunables 0 0 0 : slabdata 58 58 0
eventpoll_pwq 3264 3264 40 102 1 : tunables 0 0 0 : slabdata 32 32 0
blkdev_queue 330 330 968 33 8 : tunables 0 0 0 : slabdata 10 10 0
blkdev_requests 2368 2368 216 37 2 : tunables 0 0 0 : slabdata 64 64 0
biovec-256 350 350 3072 10 8 : tunables 0 0 0 : slabdata 35 35 0
biovec-128 693 693 1536 21 8 : tunables 0 0 0 : slabdata 33 33 0
biovec-64 1890 1890 768 42 8 : tunables 0 0 0 : slabdata 45 45 0
sock_inode_cache 8206 9408 384 42 4 : tunables 0 0 0 : slabdata 224 224 0
skbuff_fclone_cache 1806 1806 384 42 4 : tunables 0 0 0 : slabdata 43 43 0
file_lock_cache 1692 1692 112 36 1 : tunables 0 0 0 : slabdata 47 47 0
shmem_inode_cache 2244 2244 368 44 4 : tunables 0 0 0 : slabdata 51 51 0
Acpi-State 76245 76245 48 85 1 : tunables 0 0 0 : slabdata 897 897 0
taskstats 1568 1568 328 49 4 : tunables 0 0 0 : slabdata 32 32 0
proc_inode_cache 10736 10736 368 44 4 : tunables 0 0 0 : slabdata 244 244 0
sigqueue 1120 1120 144 28 1 : tunables 0 0 0 : slabdata 40 40 0
bdev_cache 608 608 512 32 4 : tunables 0 0 0 : slabdata 19 19 0
sysfs_dir_cache 36057 36057 80 51 1 : tunables 0 0 0 : slabdata 707 707 0
inode_cache 7584 7584 336 48 4 : tunables 0 0 0 : slabdata 158 158 0
dentry 32995 43584 128 32 1 : tunables 0 0 0 : slabdata 1362 1362 0
buffer_head 83001 83001 56 73 1 : tunables 0 0 0 : slabdata 1137 1137 0
vm_area_struct 51480 83352 88 46 1 : tunables 0 0 0 : slabdata 1812 1812 0
mm_struct 2257 2556 448 36 4 : tunables 0 0 0 : slabdata 71 71 0
signal_cache 3584 3584 576 28 4 : tunables 0 0 0 : slabdata 128 128 0
sighand_cache 2664 2664 1344 24 8 : tunables 0 0 0 : slabdata 111 111 0
task_xstate 8154 8268 832 39 8 : tunables 0 0 0 : slabdata 212 212 0
task_struct 8896 8896 1008 32 8 : tunables 0 0 0 : slabdata 278 278 0
anon_vma_chain 70596 96050 24 170 1 : tunables 0 0 0 : slabdata 565 565 0
anon_vma 52113 62934 40 102 1 : tunables 0 0 0 : slabdata 617 617 0
radix_tree_node 15722 22578 304 53 4 : tunables 0 0 0 : slabdata 426 426 0
idr_layer_cache 9116 9116 152 53 2 : tunables 0 0 0 : slabdata 172 172 0
dma-kmalloc-8192 0 0 8192 4 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-4096 0 0 4096 8 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-2048 0 0 2048 16 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-1024 0 0 1024 32 8 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-512 0 0 512 32 4 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-256 0 0 256 32 2 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-128 0 0 128 32 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-64 0 0 64 64 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-32 0 0 32 128 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-16 0 0 16 256 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-8 0 0 8 512 1 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-192 0 0 192 42 2 : tunables 0 0 0 : slabdata 0 0 0
dma-kmalloc-96 0 0 96 42 1 : tunables 0 0 0 : slabdata 0 0 0
kmalloc-8192 272 272 8192 4 8 : tunables 0 0 0 : slabdata 68 68 0
kmalloc-4096 585 608 4096 8 8 : tunables 0 0 0 : slabdata 76 76 0
kmalloc-2048 714 832 2048 16 8 : tunables 0 0 0 : slabdata 52 52 0
kmalloc-1024 5351 5536 1024 32 8 : tunables 0 0 0 : slabdata 173 173 0
kmalloc-512 7776 8512 512 32 4 : tunables 0 0 0 : slabdata 266 266 0
kmalloc-256 3334 3936 256 32 2 : tunables 0 0 0 : slabdata 123 123 0
kmalloc-128 5375 7744 128 32 1 : tunables 0 0 0 : slabdata 242 242 0
kmalloc-64 28005 35584 64 64 1 : tunables 0 0 0 : slabdata 556 556 0
kmalloc-32 67453 68224 32 128 1 : tunables 0 0 0 : slabdata 533 533 0
kmalloc-16 78772 83968 16 256 1 : tunables 0 0 0 : slabdata 328 328 0
kmalloc-8 70656 70656 8 512 1 : tunables 0 0 0 : slabdata 138 138 0
kmalloc-192 38594 64050 192 42 2 : tunables 0 0 0 : slabdata 1525 1525 0
kmalloc-96 21630 21630 96 42 1 : tunables 0 0 0 : slabdata 515 515 0
kmem_cache 32 32 128 32 1 : tunables 0 0 0 : slabdata 1 1 0
kmem_cache_node 512 512 32 128 1 : tunables 0 0 0 : slabdata 4 4 0
root@como:~#
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Reproducible OOM with partial workaround
2013-01-11 11:51 ` paul.szabo
@ 2013-01-11 20:31 ` Andrew Morton
2013-01-12 3:24 ` paul.szabo
0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2013-01-11 20:31 UTC (permalink / raw)
To: paul.szabo; +Cc: 695182, dave, linux-kernel, linux-mm
On Fri, 11 Jan 2013 22:51:35 +1100
paul.szabo@sydney.edu.au wrote:
> Dear Andrew,
>
> > Check /proc/slabinfo, see if all your lowmem got eaten up by buffer_heads.
>
> Please see below: I do not know what any of that means. This machine has
> been running just fine, with all my users logging in here via XDMCP from
> X-terminals, dozens logged in simultaneously. (But, I think I could make
> it go OOM with more processes or logins.)
I'm counting 107MB in slab there. Was this dump taken when the system
was at or near oom?
Please send a copy of the oom-killer kernel message dump, if you still
have one.
> > If so, you *may* be able to work around this by setting
> > /proc/sys/vm/dirty_ratio really low, so the system keeps a minimum
> > amount of dirty pagecache around. Then, with luck, if we haven't
> > broken the buffer_heads_over_limit logic it in the past decade (we
> > probably have), the VM should be able to reclaim those buffer_heads.
>
> I tried setting dirty_ratio to "funny" values, that did not seem to
> help.
Did you try setting it as low as possible?
> Did you notice my patch about bdi_position_ratio(), how it was
> plain wrong half the time (for negative x)?
Nope, please resend.
> Anyway that did not help.
>
> > Alternatively, use a filesystem which doesn't attach buffer_heads to
> > dirty pages. xfs or btrfs, perhaps.
>
> Seems there is also a problem not related to filesystem... or rather,
> the essence does not seem to be filesystem or caches. The filesystem
> thing now seems OK with my patch doing drop_caches.
hm, if doing a regular drop_caches fixes things then that implies the
problem is not with dirty pagecache. Odd.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Reproducible OOM with partial workaround
2013-01-11 20:31 ` Andrew Morton
@ 2013-01-12 3:24 ` paul.szabo
0 siblings, 0 replies; 11+ messages in thread
From: paul.szabo @ 2013-01-12 3:24 UTC (permalink / raw)
To: akpm; +Cc: 695182, dave, linux-kernel, linux-mm
Dear Andrew,
>>> Check /proc/slabinfo, see if all your lowmem got eaten up by buffer_heads.
>> Please see below ...
> ... Was this dump taken when the system was at or near oom?
No, that was a "quiescent" machine. Please see a just-before-OOM dump in
my next message (in a little while).
> Please send a copy of the oom-killer kernel message dump, if you still
> have one.
Please see one in next message, or in
http://bugs.debian.org/695182
>> I tried setting dirty_ratio to "funny" values, that did not seem to
>> help.
> Did you try setting it as low as possible?
Probably. Maybe. Sorry, cannot say with certainty.
>> Did you notice my patch about bdi_position_ratio(), how it was
>> plain wrong half the time (for negative x)?
> Nope, please resend.
Quoting from
http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=101;att=1;bug=695182
:
...
- In bdi_position_ratio() get difference (setpoint-dirty) right even
when it is negative, which happens often. Normally these numbers are
"small" and even with left-shift I never observed a 32-bit overflow.
I believe it should be possible to re-write the whole function in
32-bit ints; maybe it is not worth the effort to make it "efficient";
seeing how this function was always wrong and we survived, it should
simply be removed.
...
--- mm/page-writeback.c.old 2012-10-17 13:50:15.000000000 +1100
+++ mm/page-writeback.c 2013-01-06 21:54:59.000000000 +1100
[ Line numbers out because other patches not shown ]
...
@@ -559,7 +578,7 @@ static unsigned long bdi_position_ratio(
* => fast response on large errors; small oscillation near setpoint
*/
setpoint = (freerun + limit) / 2;
- x = div_s64((setpoint - dirty) << RATELIMIT_CALC_SHIFT,
+ x = div_s64(((s64)setpoint - (s64)dirty) << RATELIMIT_CALC_SHIFT,
limit - setpoint + 1);
pos_ratio = x;
pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
...
Cheers, Paul
Paul Szabo psz@maths.usyd.edu.au http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics University of Sydney Australia
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC] Reproducible OOM with partial workaround
2013-01-11 1:46 ` paul.szabo
2013-01-11 8:01 ` Andrew Morton
@ 2013-01-11 16:04 ` Dave Hansen
1 sibling, 0 replies; 11+ messages in thread
From: Dave Hansen @ 2013-01-11 16:04 UTC (permalink / raw)
To: paul.szabo; +Cc: 695182, linux-kernel, linux-mm
On 01/10/2013 05:46 PM, paul.szabo@sydney.edu.au wrote:
>> > ... I don't believe 64GB of RAM has _ever_ been booted on a 32-bit
>> > kernel without either violating the ABI (3GB/1GB split) or doing
>> > something that never got merged upstream ...
> Sorry to be so contradictory:
>
> psz@como:~$ uname -a
> Linux como.maths.usyd.edu.au 3.2.32-pk06.10-t01-i386 #1 SMP Sat Jan 5 18:34:25 EST 2013 i686 GNU/Linux
> psz@como:~$ free -l
> total used free shared buffers cached
> Mem: 64446900 4729292 59717608 0 15972 480520
> Low: 375836 304400 71436
> High: 64071064 4424892 59646172
> -/+ buffers/cache: 4232800 60214100
> Swap: 134217724 0 134217724
Hey, that's pretty cool! I would swear that the mem_map[] overhead was
such that they wouldn't boot, but perhaps those brain cells died on me.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2013-01-12 3:24 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-01-10 21:58 [RFC] Reproducible OOM with partial workaround paul.szabo
2013-01-10 23:12 ` Dave Hansen
2013-01-11 0:46 ` paul.szabo
2013-01-11 1:26 ` Dave Hansen
2013-01-11 1:46 ` paul.szabo
2013-01-11 8:01 ` Andrew Morton
2013-01-11 8:30 ` Simon Jeons
2013-01-11 11:51 ` paul.szabo
2013-01-11 20:31 ` Andrew Morton
2013-01-12 3:24 ` paul.szabo
2013-01-11 16:04 ` Dave Hansen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).