The pagecache unloved in zone NORMAL?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* The pagecache unloved in zone NORMAL?
@ 2013-04-11 20:30 Zlatko Calusic
  2013-05-05 21:50 ` Zlatko Calusic
  0 siblings, 1 reply; 4+ messages in thread
From: Zlatko Calusic @ 2013-04-11 20:30 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm

[-- Attachment #1: Type: text/plain, Size: 2840 bytes --]

This is something that I've been chasing for months, and I'm getting 
tired of it. :(

The issue has been observed on 4GB RAM x86_64 machines (one server, one 
desktop) without swap subsystem (not even compiled in). The important 
thing to remember about a 4GB x86_64 machine is that the NORMAL zone is 
about 6 times smaller than the DMA32 zone.

As picture is 10000 words, I've attached two graphs that nicely show 
what I've observed. As memory usage slowly rises, the MM subsystem 
gradually evicts pagecache pages from the NORMAL zone, trying to 
eventually get rid of all of them! This process takes days, typically 
more than 5 on this particular server. Of course, this means that 
eventually the zone will be choke full of anon pages, and without swap, 
the kernel can't do much about it. But as it tries to balance the zone, 
various bad things will happen. On the server I've seen sudden freeing 
of hundreds of MB of pagecache, on the desktop there's a general 
slowdown, sound dropouts (HTTP streaming) and so...

The first graph was probably 3.8 kernel, the second one is 3.9.0-rc4+ 
patched with the kswapd series v2. Obviously not much has changes wrt 
this problem, although it seems to me that kernel now hesitates freeing 
a large amounts of memory needlessly, or does it less often. But on the 
desktop there's no improvement, as soon as the pagecache gets really low 
in the NORMAL zone, there's severe slowdown, dropouts, etc... One other 
thing, the lower graphs say "Normal zone file pages", what is actually 
graphed is nr_active_file + nr_inactive_file from the NORMAL zone!

I've also attached two zoneinfo outputs. Notice how DMA32 zones have 
hundreds of thousand of pagecache pages, but only a few dozens are in 
the NORMAL zone! Also nr_vmscan_write is telling. Much higher values for 
zone NORMAL (especially when you take in account how little pagecache is 
there!), I guess those poor pagecacache pages that survives there get 
written a millisecond after they're dirtied, a probable cause of the 
slowdown I experience on the desktop.

There's a reasonable possibility that this imbalance between zones was 
introduced somewhere between 3.3  and 3.4, because VM behaves slightly 
differently in 3.3 (doesn't evict pagecache from the NORMAL zone so 
aggresively). Unfortunately, I have some userspace incompatibilities 
when running 3.3, so I'm not 100% sure (didn't run it long enough to be 
absolutely sure). I tried to find the problematic commit, and 
cc715d99e529 certainly looked like it's the culprit, but it's not! 
buffer_heads_over_limit is NEVER true on the machine, not even close. So 
that commit is basically a noop. Also it's not important if THP is on or 
off, the behaviour stays the same.

My apologies for the long email, I tried to provide as much information 
as possible.

-- 
Zlatko

[-- Attachment #2: server-3.8.png --]
[-- Type: image/png, Size: 33133 bytes --]

[-- Attachment #3: server-kswapd-v2.png --]
[-- Type: image/png, Size: 34895 bytes --]

[-- Attachment #4: zoneinfo-desktop.txt --]
[-- Type: text/plain, Size: 4261 bytes --]

Node 0, zone      DMA
  pages free     3974
        min      128
        low      160
        high     192
        scanned  0
        spanned  4080
        present  3912
        managed  3976
    nr_free_pages 3974
    nr_inactive_anon 0
    nr_active_anon 0
    nr_inactive_file 0
    nr_active_file 0
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 0
    nr_mapped    0
    nr_file_pages 0
    nr_dirty     0
    nr_writeback 0
    nr_slab_reclaimable 0
    nr_slab_unreclaimable 2
    nr_page_table_pages 0
    nr_kernel_stack 0
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 0
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     0
    nr_dirtied   0
    nr_written   0
    nr_anon_transparent_hugepages 0
    nr_free_cma  0
        protection: (0, 3259, 4015, 4015)
  pagesets
    cpu: 0
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 6
    cpu: 1
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 6
    cpu: 2
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 6
    cpu: 3
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 6
  all_unreclaimable: 1
  start_pfn:         16
  inactive_ratio:    1
Node 0, zone    DMA32
  pages free     135587
        min      27326
        low      34157
        high     40989
        scanned  0
        spanned  1044480
        present  834513
        managed  828967
    nr_free_pages 135587
    nr_inactive_anon 8165
    nr_active_anon 264237
    nr_inactive_file 190424
    nr_active_file 198798
    nr_unevictable 1
    nr_mlock     1
    nr_anon_pages 219052
    nr_mapped    33586
    nr_file_pages 397576
    nr_dirty     82
    nr_writeback 0
    nr_slab_reclaimable 21757
    nr_slab_unreclaimable 3505
    nr_page_table_pages 3293
    nr_kernel_stack 134
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 0
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     8354
    nr_dirtied   5734689
    nr_written   5557592
    nr_anon_transparent_hugepages 88
    nr_free_cma  0
        protection: (0, 0, 756, 756)
  pagesets
    cpu: 0
              count: 181
              high:  186
              batch: 31
  vm stats threshold: 36
    cpu: 1
              count: 103
              high:  186
              batch: 31
  vm stats threshold: 36
    cpu: 2
              count: 154
              high:  186
              batch: 31
  vm stats threshold: 36
    cpu: 3
              count: 149
              high:  186
              batch: 31
  vm stats threshold: 36
  all_unreclaimable: 0
  start_pfn:         4096
  inactive_ratio:    5
Node 0, zone   Normal
  pages free     7954
        min      6337
        low      7921
        high     9505
        scanned  0
        spanned  196608
        present  193536
        managed  178447
    nr_free_pages 7954
    nr_inactive_anon 1916
    nr_active_anon 136297
    nr_inactive_file 32
    nr_active_file 0
    nr_unevictable 7767
    nr_mlock     7767
    nr_anon_pages 118628
    nr_mapped    3090
    nr_file_pages 3784
    nr_dirty     4
    nr_writeback 0
    nr_slab_reclaimable 5476
    nr_slab_unreclaimable 5581
    nr_page_table_pages 2785
    nr_kernel_stack 254
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 2693969
    nr_vmscan_immediate_reclaim 10529
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     2348
    nr_dirtied   1912471
    nr_written   1784816
    nr_anon_transparent_hugepages 46
    nr_free_cma  0
        protection: (0, 0, 0, 0)
  pagesets
    cpu: 0
              count: 151
              high:  186
              batch: 31
  vm stats threshold: 24
    cpu: 1
              count: 171
              high:  186
              batch: 31
  vm stats threshold: 24
    cpu: 2
              count: 143
              high:  186
              batch: 31
  vm stats threshold: 24
    cpu: 3
              count: 54
              high:  186
              batch: 31
  vm stats threshold: 24
  all_unreclaimable: 0
  start_pfn:         1048576
  inactive_ratio:    1

[-- Attachment #5: zoneinfo-server.txt --]
[-- Type: text/plain, Size: 3628 bytes --]

Node 0, zone      DMA
  pages free     3975
        min      132
        low      165
        high     198
        scanned  0
        spanned  4080
        present  3983
        managed  3977
    nr_free_pages 3975
    nr_inactive_anon 0
    nr_active_anon 0
    nr_inactive_file 0
    nr_active_file 0
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 0
    nr_mapped    0
    nr_file_pages 0
    nr_dirty     0
    nr_writeback 0
    nr_slab_reclaimable 0
    nr_slab_unreclaimable 2
    nr_page_table_pages 0
    nr_kernel_stack 0
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 0
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     0
    nr_dirtied   0
    nr_written   0
    nr_anon_transparent_hugepages 0
    nr_free_cma  0
        protection: (0, 3236, 3934, 3934)
  pagesets
    cpu: 0
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 4
    cpu: 1
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 4
  all_unreclaimable: 1
  start_pfn:         16
  inactive_ratio:    1
Node 0, zone    DMA32
  pages free     198806
        min      27693
        low      34616
        high     41539
        scanned  0
        spanned  1044480
        present  847429
        managed  828646
    nr_free_pages 198806
    nr_inactive_anon 152
    nr_active_anon 296082
    nr_inactive_file 159143
    nr_active_file 148277
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 212100
    nr_mapped    30139
    nr_file_pages 325028
    nr_dirty     61
    nr_writeback 0
    nr_slab_reclaimable 23373
    nr_slab_unreclaimable 1418
    nr_page_table_pages 1044
    nr_kernel_stack 55
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 203475
    nr_vmscan_immediate_reclaim 1159794
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     17608
    nr_dirtied   120403187
    nr_written   119379429
    nr_anon_transparent_hugepages 130
    nr_free_cma  0
        protection: (0, 0, 697, 697)
  pagesets
    cpu: 0
              count: 121
              high:  186
              batch: 31
  vm stats threshold: 24
    cpu: 1
              count: 107
              high:  186
              batch: 31
  vm stats threshold: 24
  all_unreclaimable: 0
  start_pfn:         4096
  inactive_ratio:    5
Node 0, zone   Normal
  pages free     7449
        min      5965
        low      7456
        high     8947
        scanned  0
        spanned  196607
        present  196607
        managed  178497
    nr_free_pages 7449
    nr_inactive_anon 280
    nr_active_anon 149997
    nr_inactive_file 121
    nr_active_file 33
    nr_unevictable 0
    nr_mlock     0
    nr_anon_pages 138419
    nr_mapped    2050
    nr_file_pages 2796
    nr_dirty     4
    nr_writeback 0
    nr_slab_reclaimable 2388
    nr_slab_unreclaimable 2284
    nr_page_table_pages 1203
    nr_kernel_stack 156
    nr_unstable  0
    nr_bounce    0
    nr_vmscan_write 12486086
    nr_vmscan_immediate_reclaim 1290613
    nr_writeback_temp 0
    nr_isolated_anon 0
    nr_isolated_file 0
    nr_shmem     2642
    nr_dirtied   16946001
    nr_written   16543553
    nr_anon_transparent_hugepages 18
    nr_free_cma  0
        protection: (0, 0, 0, 0)
  pagesets
    cpu: 0
              count: 93
              high:  186
              batch: 31
  vm stats threshold: 16
    cpu: 1
              count: 114
              high:  186
              batch: 31
  vm stats threshold: 16
  all_unreclaimable: 0
  start_pfn:         1048576
  inactive_ratio:    1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: The pagecache unloved in zone NORMAL?
  2013-04-11 20:30 The pagecache unloved in zone NORMAL? Zlatko Calusic
@ 2013-05-05 21:50 ` Zlatko Calusic
  2013-05-09 20:24   ` Zlatko Calusic
  2013-05-12 17:53   ` Rik van Riel
  0 siblings, 2 replies; 4+ messages in thread
From: Zlatko Calusic @ 2013-05-05 21:50 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, Konstantin Khlebnikov

[-- Attachment #1: Type: text/plain, Size: 3911 bytes --]

On 11.04.2013 22:30, Zlatko Calusic wrote:
> This is something that I've been chasing for months, and I'm getting
> tired of it. :(
>
> The issue has been observed on 4GB RAM x86_64 machines (one server, one
> desktop) without swap subsystem (not even compiled in). The important
> thing to remember about a 4GB x86_64 machine is that the NORMAL zone is
> about 6 times smaller than the DMA32 zone.
>
> As picture is 10000 words, I've attached two graphs that nicely show
> what I've observed. As memory usage slowly rises, the MM subsystem
> gradually evicts pagecache pages from the NORMAL zone, trying to
> eventually get rid of all of them! This process takes days, typically
> more than 5 on this particular server. Of course, this means that
> eventually the zone will be choke full of anon pages, and without swap,
> the kernel can't do much about it. But as it tries to balance the zone,
> various bad things will happen. On the server I've seen sudden freeing
> of hundreds of MB of pagecache, on the desktop there's a general
> slowdown, sound dropouts (HTTP streaming) and so...

An excellent Konstantin's patch better described here 
http://marc.info/?l=linux-mm&m=136731974301311 is already giving some 
useful additional insight into this problem, just as I expected. Here's 
the data after 31h of server uptime (also see the attached graph):

Node 0, zone    DMA32
     nr_inactive_file 443705
   avg_age_inactive_file: 362800
Node 0, zone   Normal
     nr_inactive_file 32832
   avg_age_inactive_file: 38760

I reckon that only aging of the inactive LRU lists is of the interest at 
the moment, because there's currently a streaming I/O of about 8MB/s 
that can be seen on the graphs. Here's how I decipher the numbers:

DMA32: 443705 pages * 4k ~ 1733MB, 362800 ms = 362.8 seconds to go 
through the LRU and replace each page in it, which finally gives: 
1733/362.8 ~ 4.78 MB/s (approx speed at which the reclaim is goin' on)

Normal zone: 32832*4/1024/38.76 ~ 3.31 MB/s

Check: 4.78 + 3.31 ~ 8 MB/s (just about the rate of the read I/O from 
the disk)

So, if my calculations are right and my model makes sense (Konstantin, 
chime in if I got something wrong!), the reclaim is going through the 
pages in those 2 zones at a very similar speed, although there's already 
13 times less pages in the Normal zone available for streaming I/O 
caching. If this behavior continues when Normal zone get practically 
washed out of file pages (guaranteed in a few days), then we will 
measure TTL of pages in the Normal zone by milliseconds. Not a very 
useful cache, you'll agree. Of course, it's not a problem for streaming 
reads, but dirty pages that end there will be written out practically 
synchronously, and then it's no wonder that the desktop at those moments 
starts behaving worse than a trusty old 486DX2 with 16MB of RAM once was. :(

The only question I have is, is this a design mistake, or a plain bug?

I strongly believe that pages should be reclaimed at speed appropriate 
to the LRU size. After all, all those pages are the same as far as I/O 
is concerned, so there's no reason to throw out some pages after only 38 
seconds, while others are privileged to spend 6 minutes in the memory? 
Those are the numbers from the data above, and we'll see by the end of 
the following week how bad it can really get.

This imbalance is possibly the main reasons why file pages are pushed 
out from the Normal zone too aggresively in the first place. Probably, 
if we can balance the reclaim speed, the whole problem would disappear. 
It looks like faster reclaim in the smaller zone manages to throw out 
more file pages from it (anon pages replace them easier), which in turn 
makes the file LRU's even smaller, which produces even faster reclaim, 
which... you get the idea, kind of a positive feedback loop that feeds 
on itself. The kind that always ends up with a bang. ;)

-- 
Zlatko

[-- Attachment #2: screenshot12.png --]
[-- Type: image/png, Size: 51090 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: The pagecache unloved in zone NORMAL?
  2013-05-05 21:50 ` Zlatko Calusic
@ 2013-05-09 20:24   ` Zlatko Calusic
  2013-05-12 17:53   ` Rik van Riel
  1 sibling, 0 replies; 4+ messages in thread
From: Zlatko Calusic @ 2013-05-09 20:24 UTC (permalink / raw)
  To: linux-mm

On 05.05.2013 23:50, Zlatko Calusic wrote:
> useful additional insight into this problem, just as I expected. Here's
> the data after 31h of server uptime (also see the attached graph):
>
> Node 0, zone    DMA32
>      nr_inactive_file 443705
>    avg_age_inactive_file: 362800
> Node 0, zone   Normal
>      nr_inactive_file 32832
>    avg_age_inactive_file: 38760
>

4 days later:

Node 0, zone    DMA32
     nr_inactive_file 404276
     nr_vmscan_write 2897
   avg_age_inactive_file: 318208

Node 0, zone   Normal
     nr_inactive_file 4677
     nr_vmscan_write 92536
   avg_age_inactive_file: 3692

Inactive pages in the Normal zone are reclaimed in less than 4 seconds 
(vs 5 minutes in the DMA32 zone), nr_vmscan_write is high and constantly 
rising.

-- 
Zlatko

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: The pagecache unloved in zone NORMAL?
  2013-05-05 21:50 ` Zlatko Calusic
  2013-05-09 20:24   ` Zlatko Calusic
@ 2013-05-12 17:53   ` Rik van Riel
  1 sibling, 0 replies; 4+ messages in thread
From: Rik van Riel @ 2013-05-12 17:53 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: Mel Gorman, linux-mm, Konstantin Khlebnikov

On 05/05/2013 05:50 PM, Zlatko Calusic wrote:

> An excellent Konstantin's patch better described here
> http://marc.info/?l=linux-mm&m=136731974301311 is already giving some
> useful additional insight into this problem, just as I expected. Here's
> the data after 31h of server uptime (also see the attached graph):
>
> Node 0, zone    DMA32
>      nr_inactive_file 443705
>    avg_age_inactive_file: 362800
> Node 0, zone   Normal
>      nr_inactive_file 32832
>    avg_age_inactive_file: 38760
>
> I reckon that only aging of the inactive LRU lists is of the interest at
> the moment, because there's currently a streaming I/O of about 8MB/s
> that can be seen on the graphs. Here's how I decipher the numbers:

> The only question I have is, is this a design mistake, or a plain bug?

I believe this is a bug.

> I strongly believe that pages should be reclaimed at speed appropriate
> to the LRU size.

I agree. Aging the pages in one zone 10x as fast as the pages in
another zone could throw off all kinds of things, including detecting
(and preserving) the system working set, page cache readahead thrashing,
etc...

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-05-12 17:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-11 20:30 The pagecache unloved in zone NORMAL? Zlatko Calusic
2013-05-05 21:50 ` Zlatko Calusic
2013-05-09 20:24   ` Zlatko Calusic
2013-05-12 17:53   ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).