linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Very aggressive memory reclaim
@ 2011-03-28 16:39 John Lepikhin
  2011-03-28 17:42 ` Steven Rostedt
  2011-03-28 21:53 ` Dave Chinner
  0 siblings, 2 replies; 11+ messages in thread
From: John Lepikhin @ 2011-03-28 16:39 UTC (permalink / raw)
  To: linux-kernel

Hello,

I use high-loaded machine with 10M+ inodes inside XFS, 50+ GB of
memory, intensive HDD traffic and 20..50 forks per second. Vanilla
kernel 2.6.37.4. The problem is that kernel frees memory very
aggressively.

For example:

25% of memory is used by processes
50% for page caches
7% for slabs, etc.
18% free.

That's bad but works. After few hours:

25% of memory is used by processes
62% for page caches
7% for slabs, etc.
5% free.

Most of files are cached, works perfectly. This is the moment when
kernel decides to free some memory. After memory reclaim:

25% of memory is used by processes
25% for page caches(!)
7% for slabs, etc.
43% free(!)

Page cache is dropped, server becomes too slow. This is the beginning
of new cycle.

I didn't found any huge mallocs at that moment. Looks like because of
large number of small mallocs (forks) kernel have pessimistic forecast
about future memory usage and frees too much memory. Is there any
options of tuning this? Any other variants?

Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Very aggressive memory reclaim
  2011-03-28 16:39 Very aggressive memory reclaim John Lepikhin
@ 2011-03-28 17:42 ` Steven Rostedt
  2011-03-28 21:53 ` Dave Chinner
  1 sibling, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2011-03-28 17:42 UTC (permalink / raw)
  To: John Lepikhin; +Cc: linux-kernel, Alexander Viro, linux-fsdevel

[ Add Cc's of those that may help you ]

-- Steve

On Mon, Mar 28, 2011 at 08:39:29PM +0400, John Lepikhin wrote:
> Hello,
> 
> I use high-loaded machine with 10M+ inodes inside XFS, 50+ GB of
> memory, intensive HDD traffic and 20..50 forks per second. Vanilla
> kernel 2.6.37.4. The problem is that kernel frees memory very
> aggressively.
> 
> For example:
> 
> 25% of memory is used by processes
> 50% for page caches
> 7% for slabs, etc.
> 18% free.
> 
> That's bad but works. After few hours:
> 
> 25% of memory is used by processes
> 62% for page caches
> 7% for slabs, etc.
> 5% free.
> 
> Most of files are cached, works perfectly. This is the moment when
> kernel decides to free some memory. After memory reclaim:
> 
> 25% of memory is used by processes
> 25% for page caches(!)
> 7% for slabs, etc.
> 43% free(!)
> 
> Page cache is dropped, server becomes too slow. This is the beginning
> of new cycle.
> 
> I didn't found any huge mallocs at that moment. Looks like because of
> large number of small mallocs (forks) kernel have pessimistic forecast
> about future memory usage and frees too much memory. Is there any
> options of tuning this? Any other variants?
> 
> Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Very aggressive memory reclaim
  2011-03-28 16:39 Very aggressive memory reclaim John Lepikhin
  2011-03-28 17:42 ` Steven Rostedt
@ 2011-03-28 21:53 ` Dave Chinner
  2011-03-28 22:52   ` Minchan Kim
                     ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: Dave Chinner @ 2011-03-28 21:53 UTC (permalink / raw)
  To: John Lepikhin; +Cc: linux-kernel, xfs, linux-mm

[cc xfs and mm lists]

On Mon, Mar 28, 2011 at 08:39:29PM +0400, John Lepikhin wrote:
> Hello,
> 
> I use high-loaded machine with 10M+ inodes inside XFS, 50+ GB of
> memory, intensive HDD traffic and 20..50 forks per second. Vanilla
> kernel 2.6.37.4. The problem is that kernel frees memory very
> aggressively.
> 
> For example:
> 
> 25% of memory is used by processes
> 50% for page caches
> 7% for slabs, etc.
> 18% free.
> 
> That's bad but works. After few hours:
> 
> 25% of memory is used by processes
> 62% for page caches
> 7% for slabs, etc.
> 5% free.
> 
> Most of files are cached, works perfectly. This is the moment when
> kernel decides to free some memory. After memory reclaim:
> 
> 25% of memory is used by processes
> 25% for page caches(!)
> 7% for slabs, etc.
> 43% free(!)
> 
> Page cache is dropped, server becomes too slow. This is the beginning
> of new cycle.
> 
> I didn't found any huge mallocs at that moment. Looks like because of
> large number of small mallocs (forks) kernel have pessimistic forecast
> about future memory usage and frees too much memory. Is there any
> options of tuning this? Any other variants?

First it would be useful to determine why the VM is reclaiming so
much memory. If it is somewhat predictable when the excessive
reclaim is going to happen, it might be worth capturing an event
trace from the VM so we can see more precisely what it is doiing
during this event. In that case, recording the kmem/* and vmscan/*
events is probably sufficient to tell us what memory allocations
triggered reclaim and how much reclaim was done on each event.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Very aggressive memory reclaim
  2011-03-28 21:53 ` Dave Chinner
@ 2011-03-28 22:52   ` Minchan Kim
  2011-03-29  2:55     ` KOSAKI Motohiro
  2011-03-29  7:22     ` John Lepikhin
  2011-03-28 23:58   ` Andi Kleen
  2011-03-29  7:26   ` John Lepikhin
  2 siblings, 2 replies; 11+ messages in thread
From: Minchan Kim @ 2011-03-28 22:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: John Lepikhin, linux-kernel, xfs, linux-mm

On Tue, Mar 29, 2011 at 6:53 AM, Dave Chinner <david@fromorbit.com> wrote:
> [cc xfs and mm lists]
>
> On Mon, Mar 28, 2011 at 08:39:29PM +0400, John Lepikhin wrote:
>> Hello,
>>
>> I use high-loaded machine with 10M+ inodes inside XFS, 50+ GB of
>> memory, intensive HDD traffic and 20..50 forks per second. Vanilla
>> kernel 2.6.37.4. The problem is that kernel frees memory very
>> aggressively.
>>
>> For example:
>>
>> 25% of memory is used by processes
>> 50% for page caches
>> 7% for slabs, etc.
>> 18% free.
>>
>> That's bad but works. After few hours:
>>
>> 25% of memory is used by processes
>> 62% for page caches
>> 7% for slabs, etc.
>> 5% free.
>>
>> Most of files are cached, works perfectly. This is the moment when
>> kernel decides to free some memory. After memory reclaim:
>>
>> 25% of memory is used by processes
>> 25% for page caches(!)
>> 7% for slabs, etc.
>> 43% free(!)
>>
>> Page cache is dropped, server becomes too slow. This is the beginning
>> of new cycle.
>>
>> I didn't found any huge mallocs at that moment. Looks like because of
>> large number of small mallocs (forks) kernel have pessimistic forecast
>> about future memory usage and frees too much memory. Is there any
>> options of tuning this? Any other variants?
>
> First it would be useful to determine why the VM is reclaiming so
> much memory. If it is somewhat predictable when the excessive
> reclaim is going to happen, it might be worth capturing an event
> trace from the VM so we can see more precisely what it is doiing
> during this event. In that case, recording the kmem/* and vmscan/*
> events is probably sufficient to tell us what memory allocations
> triggered reclaim and how much reclaim was done on each event.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>

Recently, We had a similar issue.
http://www.spinics.net/lists/linux-mm/msg12243.html
But it seems to not merge. I don't know why since I didn't follow up the thread.
Maybe Cced guys can help you.

Is it a sudden big cache drop at the moment or accumulated small cache
drop for long time?
What's your zones' size?

Please attach the result of cat /proc/zoneinfo for others.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Very aggressive memory reclaim
  2011-03-28 21:53 ` Dave Chinner
  2011-03-28 22:52   ` Minchan Kim
@ 2011-03-28 23:58   ` Andi Kleen
  2011-03-29  1:57     ` Dave Chinner
  2011-03-29  7:26   ` John Lepikhin
  2 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2011-03-28 23:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: John Lepikhin, linux-kernel, xfs, linux-mm

Dave Chinner <david@fromorbit.com> writes:
>
> First it would be useful to determine why the VM is reclaiming so
> much memory. If it is somewhat predictable when the excessive
> reclaim is going to happen, it might be worth capturing an event

Often it's to get pages of a higher order. Just tracing alloc_pages
should tell you that.

There are a few other cases (like memory failure handling), but they're
more obscure.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Very aggressive memory reclaim
  2011-03-28 23:58   ` Andi Kleen
@ 2011-03-29  1:57     ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2011-03-29  1:57 UTC (permalink / raw)
  To: Andi Kleen; +Cc: John Lepikhin, linux-kernel, xfs, linux-mm

On Mon, Mar 28, 2011 at 04:58:50PM -0700, Andi Kleen wrote:
> Dave Chinner <david@fromorbit.com> writes:
> >
> > First it would be useful to determine why the VM is reclaiming so
> > much memory. If it is somewhat predictable when the excessive
> > reclaim is going to happen, it might be worth capturing an event
> 
> Often it's to get pages of a higher order. Just tracing alloc_pages
> should tell you that.

Yes, the kmem/mm_page_alloc tracepoint gives us that. But in case
that is not the cause, grabbing all the trace points I suggested is
more likely to indicate where the problem is. I'd prefer to get more
data than needed the first time around than have to do multiple
round trips because a single trace point doesn't tell us the cause...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Very aggressive memory reclaim
  2011-03-28 22:52   ` Minchan Kim
@ 2011-03-29  2:55     ` KOSAKI Motohiro
  2011-03-29  7:33       ` John Lepikhin
  2011-03-29  7:22     ` John Lepikhin
  1 sibling, 1 reply; 11+ messages in thread
From: KOSAKI Motohiro @ 2011-03-29  2:55 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Dave Chinner, John Lepikhin, linux-kernel, xfs,
	linux-mm

> Recently, We had a similar issue.
> http://www.spinics.net/lists/linux-mm/msg12243.html
> But it seems to not merge. I don't know why since I didn't follow up the thread.
> Maybe Cced guys can help you.
> 
> Is it a sudden big cache drop at the moment or accumulated small cache
> drop for long time?
> What's your zones' size?
> 
> Please attach the result of cat /proc/zoneinfo for others.

If my remember is correct, 2.6.38 is included Mel's anti agressive 
reclaim patch. And original report seems to be using 2.6.37.x. 

John, can you try 2.6.38?




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Very aggressive memory reclaim
  2011-03-28 22:52   ` Minchan Kim
  2011-03-29  2:55     ` KOSAKI Motohiro
@ 2011-03-29  7:22     ` John Lepikhin
  1 sibling, 0 replies; 11+ messages in thread
From: John Lepikhin @ 2011-03-29  7:22 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Dave Chinner, linux-kernel, xfs, linux-mm

[-- Attachment #1: Type: text/plain, Size: 204 bytes --]

2011/3/29 Minchan Kim <minchan.kim@gmail.com>:

> Please attach the result of cat /proc/zoneinfo for others.

See attachment. Right now I have no zoneinfo for crisis time, but I
can catch it if required.

[-- Attachment #2: zoneinfo --]
[-- Type: application/octet-stream, Size: 14655 bytes --]

Node 0, zone      DMA
  pages free     3968
        min      0
        low      0
        high     0
        scanned  0
        spanned  4080
        present  3920
    nr_free_pages 3968
    nr_inactive_anon 0
    nr_active_anon 0
    nr_inactive_file 0
    nr_active_file 0
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 0
    nr_mapped    0
    nr_file_pages 0
    nr_dirty     0
    nr_writeback 0
    nr_slab_reclaimable 0
    nr_slab_unreclaimable 0
    nr_page_table_pages 0
    nr_kernel_stack 0
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     0
    nr_dirtied   0
    nr_written   0
    numa_hit     0
    numa_miss    0
    numa_foreign 0
    numa_interleave 0
    numa_local   0
    numa_other   0
        protection: (0, 2173, 48380, 48380)
  pagesets
    cpu: 0
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 1
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 2
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 3
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 4
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 5
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 6
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 7
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 8
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 9
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 10
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 11
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 12
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 13
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 14
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 15
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 16
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 17
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 18
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 19
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 20
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 21
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 22
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
    cpu: 23
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 10
  all_unreclaimable: 0
  start_pfn:         16
  inactive_ratio:    1
Node 0, zone    DMA32
  pages free     364761
        min      16
        low      20
        high     24
        scanned  0
        spanned  1044480
        present  556409
    nr_free_pages 364761
    nr_inactive_anon 21990
    nr_active_anon 7663
    nr_inactive_file 73
    nr_active_file 2300
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 28933
    nr_mapped    419
    nr_file_pages 3093
    nr_dirty     2
    nr_writeback 0
    nr_slab_reclaimable 114260
    nr_slab_unreclaimable 17480
    nr_page_table_pages 52
    nr_kernel_stack 4
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 1394045
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     720
    nr_dirtied   2710373
    nr_written   3892495
    numa_hit     457923559
    numa_miss    142009831
    numa_foreign 0
    numa_interleave 0
    numa_local   457586157
    numa_other   142347233
        protection: (0, 0, 46207, 46207)
  pagesets
    cpu: 0
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 1
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 2
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 3
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 4
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 5
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 6
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 7
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 8
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 9
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 10
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 11
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 12
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 13
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 14
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 15
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 16
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 17
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 18
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 19
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 20
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 21
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 22
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
    cpu: 23
              count: 0
              high:  186
              batch: 31
  vm stats threshold: 60
  all_unreclaimable: 0
  start_pfn:         4096
  inactive_ratio:    4
Node 0, zone   Normal
  pages free     1647180
        min      357
        low      446
        high     535
        scanned  0
        spanned  11993088
        present  11829120
    nr_free_pages 1647180
    nr_inactive_anon 450069
    nr_active_anon 3433771
    nr_inactive_file 2955991
    nr_active_file 2119000
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 3638671
    nr_mapped    115635
    nr_file_pages 5320039
    nr_dirty     75137
    nr_writeback 2
    nr_slab_reclaimable 786231
    nr_slab_unreclaimable 97963
    nr_page_table_pages 67845
    nr_kernel_stack 1361
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 15379957
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     245041
    nr_dirtied   211669545
    nr_written   202526281
    numa_hit     41186823085
    numa_miss    235241135
    numa_foreign 690066765
    numa_interleave 27671
    numa_local   41186809452
    numa_other   235254768
        protection: (0, 0, 0, 0)
  pagesets
    cpu: 0
              count: 153
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 1
              count: 156
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 2
              count: 160
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 3
              count: 177
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 4
              count: 157
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 5
              count: 165
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 6
              count: 26
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 7
              count: 161
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 8
              count: 137
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 9
              count: 169
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 10
              count: 27
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 11
              count: 164
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 12
              count: 177
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 13
              count: 180
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 14
              count: 180
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 15
              count: 183
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 16
              count: 98
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 17
              count: 170
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 18
              count: 92
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 19
              count: 157
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 20
              count: 162
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 21
              count: 177
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 22
              count: 157
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 23
              count: 171
              high:  186
              batch: 31
  vm stats threshold: 100
  all_unreclaimable: 0
  start_pfn:         1048576
  inactive_ratio:    21
Node 1, zone   Normal
  pages free     2790210
        min      375
        low      468
        high     562
        scanned  0
        spanned  12582912
        present  12410880
    nr_free_pages 2790210
    nr_inactive_anon 551309
    nr_active_anon 3043861
    nr_inactive_file 2311025
    nr_active_file 2310059
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 3297378
    nr_mapped    141699
    nr_file_pages 4918896
    nr_dirty     51559
    nr_writeback 0
    nr_slab_reclaimable 862638
    nr_slab_unreclaimable 123830
    nr_page_table_pages 145273
    nr_kernel_stack 1497
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 10156465
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     297798
    nr_dirtied   216473986
    nr_written   191524308
    numa_hit     43711879913
    numa_miss    690066765
    numa_foreign 377250966
    numa_interleave 27810
    numa_local   43711847977
    numa_other   690098701
        protection: (0, 0, 0, 0)
  pagesets
    cpu: 0
              count: 159
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 1
              count: 71
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 2
              count: 175
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 3
              count: 180
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 4
              count: 181
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 5
              count: 74
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 6
              count: 170
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 7
              count: 159
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 8
              count: 176
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 9
              count: 161
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 10
              count: 180
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 11
              count: 5
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 12
              count: 184
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 13
              count: 122
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 14
              count: 168
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 15
              count: 123
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 16
              count: 155
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 17
              count: 37
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 18
              count: 177
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 19
              count: 44
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 20
              count: 185
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 21
              count: 28
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 22
              count: 172
              high:  186
              batch: 31
  vm stats threshold: 100
    cpu: 23
              count: 190
              high:  186
              batch: 31
  vm stats threshold: 100
  all_unreclaimable: 0
  start_pfn:         13041664
  inactive_ratio:    21

[-- Attachment #3: meminfo --]
[-- Type: application/octet-stream, Size: 1015 bytes --]

MemTotal:       99149428 kB
MemFree:        19224476 kB
Buffers:               0 kB
Cached:         40968112 kB
SwapCached:            0 kB
Active:         43666616 kB
Inactive:       25161828 kB
Active(anon):   25941180 kB
Inactive(anon):  4093472 kB
Active(file):   17725436 kB
Inactive(file): 21068356 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:            506792 kB
Writeback:             0 kB
AnonPages:      27859928 kB
Mapped:          1031012 kB
Shmem:           2174236 kB
Slab:            8009608 kB
SReclaimable:    7052516 kB
SUnreclaim:       957092 kB
KernelStack:       22896 kB
PageTables:       852680 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    49574712 kB
Committed_AS:   301362748 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      635548 kB
VmallocChunk:   34258959360 kB
DirectMap4k:        1284 kB
DirectMap2M:     3084288 kB
DirectMap1G:    97517568 kB

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Very aggressive memory reclaim
  2011-03-28 21:53 ` Dave Chinner
  2011-03-28 22:52   ` Minchan Kim
  2011-03-28 23:58   ` Andi Kleen
@ 2011-03-29  7:26   ` John Lepikhin
  2011-03-29  8:59     ` Avi Kivity
  2 siblings, 1 reply; 11+ messages in thread
From: John Lepikhin @ 2011-03-29  7:26 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, xfs, linux-mm

2011/3/29 Dave Chinner <david@fromorbit.com>:

> First it would be useful to determine why the VM is reclaiming so
> much memory. If it is somewhat predictable when the excessive
> reclaim is going to happen, it might be worth capturing an event
> trace from the VM so we can see more precisely what it is doiing
> during this event. In that case, recording the kmem/* and vmscan/*
> events is probably sufficient to tell us what memory allocations
> triggered reclaim and how much reclaim was done on each event.

Do you mean I must add some debug to mm functions? I don't know any
other way to catch such events.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Very aggressive memory reclaim
  2011-03-29  2:55     ` KOSAKI Motohiro
@ 2011-03-29  7:33       ` John Lepikhin
  0 siblings, 0 replies; 11+ messages in thread
From: John Lepikhin @ 2011-03-29  7:33 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Minchan Kim, Dave Chinner, linux-kernel, xfs, linux-mm

2011/3/29 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>:

> If my remember is correct, 2.6.38 is included Mel's anti agressive
> reclaim patch. And original report seems to be using 2.6.37.x.
>
> John, can you try 2.6.38?

I'll ask my boss about it. Unfortunately we found opposite issue with
memory management + XFS (100M of inodes) on 2.6.38: some objects in
xfs_inode and dentry slabs are seems to be never cleared (at least
without "sync && echo 2 >.../drop_caches"). But this is not a
production machine working 24x7, so we don't care about it right now.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Very aggressive memory reclaim
  2011-03-29  7:26   ` John Lepikhin
@ 2011-03-29  8:59     ` Avi Kivity
  0 siblings, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2011-03-29  8:59 UTC (permalink / raw)
  To: John Lepikhin; +Cc: Dave Chinner, linux-kernel, xfs, linux-mm

On 03/29/2011 09:26 AM, John Lepikhin wrote:
> 2011/3/29 Dave Chinner<david@fromorbit.com>:
>
> >  First it would be useful to determine why the VM is reclaiming so
> >  much memory. If it is somewhat predictable when the excessive
> >  reclaim is going to happen, it might be worth capturing an event
> >  trace from the VM so we can see more precisely what it is doiing
> >  during this event. In that case, recording the kmem/* and vmscan/*
> >  events is probably sufficient to tell us what memory allocations
> >  triggered reclaim and how much reclaim was done on each event.
>
> Do you mean I must add some debug to mm functions? I don't know any
> other way to catch such events.

Download and build trace-cmd 
(git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git), 
and do

$ trace-cmd record -e kmem -e vmscan -b 30000

Hit ctrl-C when done and post the output file generated in cwd.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-03-29  8:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-28 16:39 Very aggressive memory reclaim John Lepikhin
2011-03-28 17:42 ` Steven Rostedt
2011-03-28 21:53 ` Dave Chinner
2011-03-28 22:52   ` Minchan Kim
2011-03-29  2:55     ` KOSAKI Motohiro
2011-03-29  7:33       ` John Lepikhin
2011-03-29  7:22     ` John Lepikhin
2011-03-28 23:58   ` Andi Kleen
2011-03-29  1:57     ` Dave Chinner
2011-03-29  7:26   ` John Lepikhin
2011-03-29  8:59     ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).