* The pagecache unloved in zone NORMAL?
@ 2013-04-11 20:30 Zlatko Calusic
2013-05-05 21:50 ` Zlatko Calusic
0 siblings, 1 reply; 4+ messages in thread
From: Zlatko Calusic @ 2013-04-11 20:30 UTC (permalink / raw)
To: Mel Gorman; +Cc: linux-mm
[-- Attachment #1: Type: text/plain, Size: 2840 bytes --]
This is something that I've been chasing for months, and I'm getting
tired of it. :(
The issue has been observed on 4GB RAM x86_64 machines (one server, one
desktop) without swap subsystem (not even compiled in). The important
thing to remember about a 4GB x86_64 machine is that the NORMAL zone is
about 6 times smaller than the DMA32 zone.
As picture is 10000 words, I've attached two graphs that nicely show
what I've observed. As memory usage slowly rises, the MM subsystem
gradually evicts pagecache pages from the NORMAL zone, trying to
eventually get rid of all of them! This process takes days, typically
more than 5 on this particular server. Of course, this means that
eventually the zone will be choke full of anon pages, and without swap,
the kernel can't do much about it. But as it tries to balance the zone,
various bad things will happen. On the server I've seen sudden freeing
of hundreds of MB of pagecache, on the desktop there's a general
slowdown, sound dropouts (HTTP streaming) and so...
The first graph was probably 3.8 kernel, the second one is 3.9.0-rc4+
patched with the kswapd series v2. Obviously not much has changes wrt
this problem, although it seems to me that kernel now hesitates freeing
a large amounts of memory needlessly, or does it less often. But on the
desktop there's no improvement, as soon as the pagecache gets really low
in the NORMAL zone, there's severe slowdown, dropouts, etc... One other
thing, the lower graphs say "Normal zone file pages", what is actually
graphed is nr_active_file + nr_inactive_file from the NORMAL zone!
I've also attached two zoneinfo outputs. Notice how DMA32 zones have
hundreds of thousand of pagecache pages, but only a few dozens are in
the NORMAL zone! Also nr_vmscan_write is telling. Much higher values for
zone NORMAL (especially when you take in account how little pagecache is
there!), I guess those poor pagecacache pages that survives there get
written a millisecond after they're dirtied, a probable cause of the
slowdown I experience on the desktop.
There's a reasonable possibility that this imbalance between zones was
introduced somewhere between 3.3 and 3.4, because VM behaves slightly
differently in 3.3 (doesn't evict pagecache from the NORMAL zone so
aggresively). Unfortunately, I have some userspace incompatibilities
when running 3.3, so I'm not 100% sure (didn't run it long enough to be
absolutely sure). I tried to find the problematic commit, and
cc715d99e529 certainly looked like it's the culprit, but it's not!
buffer_heads_over_limit is NEVER true on the machine, not even close. So
that commit is basically a noop. Also it's not important if THP is on or
off, the behaviour stays the same.
My apologies for the long email, I tried to provide as much information
as possible.
--
Zlatko
[-- Attachment #2: server-3.8.png --]
[-- Type: image/png, Size: 33133 bytes --]
[-- Attachment #3: server-kswapd-v2.png --]
[-- Type: image/png, Size: 34895 bytes --]
[-- Attachment #4: zoneinfo-desktop.txt --]
[-- Type: text/plain, Size: 4261 bytes --]
Node 0, zone DMA
pages free 3974
min 128
low 160
high 192
scanned 0
spanned 4080
present 3912
managed 3976
nr_free_pages 3974
nr_inactive_anon 0
nr_active_anon 0
nr_inactive_file 0
nr_active_file 0
nr_unevictable 0
nr_mlock 0
nr_anon_pages 0
nr_mapped 0
nr_file_pages 0
nr_dirty 0
nr_writeback 0
nr_slab_reclaimable 0
nr_slab_unreclaimable 2
nr_page_table_pages 0
nr_kernel_stack 0
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 0
nr_dirtied 0
nr_written 0
nr_anon_transparent_hugepages 0
nr_free_cma 0
protection: (0, 3259, 4015, 4015)
pagesets
cpu: 0
count: 0
high: 0
batch: 1
vm stats threshold: 6
cpu: 1
count: 0
high: 0
batch: 1
vm stats threshold: 6
cpu: 2
count: 0
high: 0
batch: 1
vm stats threshold: 6
cpu: 3
count: 0
high: 0
batch: 1
vm stats threshold: 6
all_unreclaimable: 1
start_pfn: 16
inactive_ratio: 1
Node 0, zone DMA32
pages free 135587
min 27326
low 34157
high 40989
scanned 0
spanned 1044480
present 834513
managed 828967
nr_free_pages 135587
nr_inactive_anon 8165
nr_active_anon 264237
nr_inactive_file 190424
nr_active_file 198798
nr_unevictable 1
nr_mlock 1
nr_anon_pages 219052
nr_mapped 33586
nr_file_pages 397576
nr_dirty 82
nr_writeback 0
nr_slab_reclaimable 21757
nr_slab_unreclaimable 3505
nr_page_table_pages 3293
nr_kernel_stack 134
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 8354
nr_dirtied 5734689
nr_written 5557592
nr_anon_transparent_hugepages 88
nr_free_cma 0
protection: (0, 0, 756, 756)
pagesets
cpu: 0
count: 181
high: 186
batch: 31
vm stats threshold: 36
cpu: 1
count: 103
high: 186
batch: 31
vm stats threshold: 36
cpu: 2
count: 154
high: 186
batch: 31
vm stats threshold: 36
cpu: 3
count: 149
high: 186
batch: 31
vm stats threshold: 36
all_unreclaimable: 0
start_pfn: 4096
inactive_ratio: 5
Node 0, zone Normal
pages free 7954
min 6337
low 7921
high 9505
scanned 0
spanned 196608
present 193536
managed 178447
nr_free_pages 7954
nr_inactive_anon 1916
nr_active_anon 136297
nr_inactive_file 32
nr_active_file 0
nr_unevictable 7767
nr_mlock 7767
nr_anon_pages 118628
nr_mapped 3090
nr_file_pages 3784
nr_dirty 4
nr_writeback 0
nr_slab_reclaimable 5476
nr_slab_unreclaimable 5581
nr_page_table_pages 2785
nr_kernel_stack 254
nr_unstable 0
nr_bounce 0
nr_vmscan_write 2693969
nr_vmscan_immediate_reclaim 10529
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 2348
nr_dirtied 1912471
nr_written 1784816
nr_anon_transparent_hugepages 46
nr_free_cma 0
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 151
high: 186
batch: 31
vm stats threshold: 24
cpu: 1
count: 171
high: 186
batch: 31
vm stats threshold: 24
cpu: 2
count: 143
high: 186
batch: 31
vm stats threshold: 24
cpu: 3
count: 54
high: 186
batch: 31
vm stats threshold: 24
all_unreclaimable: 0
start_pfn: 1048576
inactive_ratio: 1
[-- Attachment #5: zoneinfo-server.txt --]
[-- Type: text/plain, Size: 3628 bytes --]
Node 0, zone DMA
pages free 3975
min 132
low 165
high 198
scanned 0
spanned 4080
present 3983
managed 3977
nr_free_pages 3975
nr_inactive_anon 0
nr_active_anon 0
nr_inactive_file 0
nr_active_file 0
nr_unevictable 0
nr_mlock 0
nr_anon_pages 0
nr_mapped 0
nr_file_pages 0
nr_dirty 0
nr_writeback 0
nr_slab_reclaimable 0
nr_slab_unreclaimable 2
nr_page_table_pages 0
nr_kernel_stack 0
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 0
nr_dirtied 0
nr_written 0
nr_anon_transparent_hugepages 0
nr_free_cma 0
protection: (0, 3236, 3934, 3934)
pagesets
cpu: 0
count: 0
high: 0
batch: 1
vm stats threshold: 4
cpu: 1
count: 0
high: 0
batch: 1
vm stats threshold: 4
all_unreclaimable: 1
start_pfn: 16
inactive_ratio: 1
Node 0, zone DMA32
pages free 198806
min 27693
low 34616
high 41539
scanned 0
spanned 1044480
present 847429
managed 828646
nr_free_pages 198806
nr_inactive_anon 152
nr_active_anon 296082
nr_inactive_file 159143
nr_active_file 148277
nr_unevictable 0
nr_mlock 0
nr_anon_pages 212100
nr_mapped 30139
nr_file_pages 325028
nr_dirty 61
nr_writeback 0
nr_slab_reclaimable 23373
nr_slab_unreclaimable 1418
nr_page_table_pages 1044
nr_kernel_stack 55
nr_unstable 0
nr_bounce 0
nr_vmscan_write 203475
nr_vmscan_immediate_reclaim 1159794
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 17608
nr_dirtied 120403187
nr_written 119379429
nr_anon_transparent_hugepages 130
nr_free_cma 0
protection: (0, 0, 697, 697)
pagesets
cpu: 0
count: 121
high: 186
batch: 31
vm stats threshold: 24
cpu: 1
count: 107
high: 186
batch: 31
vm stats threshold: 24
all_unreclaimable: 0
start_pfn: 4096
inactive_ratio: 5
Node 0, zone Normal
pages free 7449
min 5965
low 7456
high 8947
scanned 0
spanned 196607
present 196607
managed 178497
nr_free_pages 7449
nr_inactive_anon 280
nr_active_anon 149997
nr_inactive_file 121
nr_active_file 33
nr_unevictable 0
nr_mlock 0
nr_anon_pages 138419
nr_mapped 2050
nr_file_pages 2796
nr_dirty 4
nr_writeback 0
nr_slab_reclaimable 2388
nr_slab_unreclaimable 2284
nr_page_table_pages 1203
nr_kernel_stack 156
nr_unstable 0
nr_bounce 0
nr_vmscan_write 12486086
nr_vmscan_immediate_reclaim 1290613
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 2642
nr_dirtied 16946001
nr_written 16543553
nr_anon_transparent_hugepages 18
nr_free_cma 0
protection: (0, 0, 0, 0)
pagesets
cpu: 0
count: 93
high: 186
batch: 31
vm stats threshold: 16
cpu: 1
count: 114
high: 186
batch: 31
vm stats threshold: 16
all_unreclaimable: 0
start_pfn: 1048576
inactive_ratio: 1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: The pagecache unloved in zone NORMAL?
2013-04-11 20:30 The pagecache unloved in zone NORMAL? Zlatko Calusic
@ 2013-05-05 21:50 ` Zlatko Calusic
2013-05-09 20:24 ` Zlatko Calusic
2013-05-12 17:53 ` Rik van Riel
0 siblings, 2 replies; 4+ messages in thread
From: Zlatko Calusic @ 2013-05-05 21:50 UTC (permalink / raw)
To: Mel Gorman; +Cc: linux-mm, Konstantin Khlebnikov
[-- Attachment #1: Type: text/plain, Size: 3911 bytes --]
On 11.04.2013 22:30, Zlatko Calusic wrote:
> This is something that I've been chasing for months, and I'm getting
> tired of it. :(
>
> The issue has been observed on 4GB RAM x86_64 machines (one server, one
> desktop) without swap subsystem (not even compiled in). The important
> thing to remember about a 4GB x86_64 machine is that the NORMAL zone is
> about 6 times smaller than the DMA32 zone.
>
> As picture is 10000 words, I've attached two graphs that nicely show
> what I've observed. As memory usage slowly rises, the MM subsystem
> gradually evicts pagecache pages from the NORMAL zone, trying to
> eventually get rid of all of them! This process takes days, typically
> more than 5 on this particular server. Of course, this means that
> eventually the zone will be choke full of anon pages, and without swap,
> the kernel can't do much about it. But as it tries to balance the zone,
> various bad things will happen. On the server I've seen sudden freeing
> of hundreds of MB of pagecache, on the desktop there's a general
> slowdown, sound dropouts (HTTP streaming) and so...
An excellent Konstantin's patch better described here
http://marc.info/?l=linux-mm&m=136731974301311 is already giving some
useful additional insight into this problem, just as I expected. Here's
the data after 31h of server uptime (also see the attached graph):
Node 0, zone DMA32
nr_inactive_file 443705
avg_age_inactive_file: 362800
Node 0, zone Normal
nr_inactive_file 32832
avg_age_inactive_file: 38760
I reckon that only aging of the inactive LRU lists is of the interest at
the moment, because there's currently a streaming I/O of about 8MB/s
that can be seen on the graphs. Here's how I decipher the numbers:
DMA32: 443705 pages * 4k ~ 1733MB, 362800 ms = 362.8 seconds to go
through the LRU and replace each page in it, which finally gives:
1733/362.8 ~ 4.78 MB/s (approx speed at which the reclaim is goin' on)
Normal zone: 32832*4/1024/38.76 ~ 3.31 MB/s
Check: 4.78 + 3.31 ~ 8 MB/s (just about the rate of the read I/O from
the disk)
So, if my calculations are right and my model makes sense (Konstantin,
chime in if I got something wrong!), the reclaim is going through the
pages in those 2 zones at a very similar speed, although there's already
13 times less pages in the Normal zone available for streaming I/O
caching. If this behavior continues when Normal zone get practically
washed out of file pages (guaranteed in a few days), then we will
measure TTL of pages in the Normal zone by milliseconds. Not a very
useful cache, you'll agree. Of course, it's not a problem for streaming
reads, but dirty pages that end there will be written out practically
synchronously, and then it's no wonder that the desktop at those moments
starts behaving worse than a trusty old 486DX2 with 16MB of RAM once was. :(
The only question I have is, is this a design mistake, or a plain bug?
I strongly believe that pages should be reclaimed at speed appropriate
to the LRU size. After all, all those pages are the same as far as I/O
is concerned, so there's no reason to throw out some pages after only 38
seconds, while others are privileged to spend 6 minutes in the memory?
Those are the numbers from the data above, and we'll see by the end of
the following week how bad it can really get.
This imbalance is possibly the main reasons why file pages are pushed
out from the Normal zone too aggresively in the first place. Probably,
if we can balance the reclaim speed, the whole problem would disappear.
It looks like faster reclaim in the smaller zone manages to throw out
more file pages from it (anon pages replace them easier), which in turn
makes the file LRU's even smaller, which produces even faster reclaim,
which... you get the idea, kind of a positive feedback loop that feeds
on itself. The kind that always ends up with a bang. ;)
--
Zlatko
[-- Attachment #2: screenshot12.png --]
[-- Type: image/png, Size: 51090 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: The pagecache unloved in zone NORMAL?
2013-05-05 21:50 ` Zlatko Calusic
@ 2013-05-09 20:24 ` Zlatko Calusic
2013-05-12 17:53 ` Rik van Riel
1 sibling, 0 replies; 4+ messages in thread
From: Zlatko Calusic @ 2013-05-09 20:24 UTC (permalink / raw)
To: linux-mm
On 05.05.2013 23:50, Zlatko Calusic wrote:
> useful additional insight into this problem, just as I expected. Here's
> the data after 31h of server uptime (also see the attached graph):
>
> Node 0, zone DMA32
> nr_inactive_file 443705
> avg_age_inactive_file: 362800
> Node 0, zone Normal
> nr_inactive_file 32832
> avg_age_inactive_file: 38760
>
4 days later:
Node 0, zone DMA32
nr_inactive_file 404276
nr_vmscan_write 2897
avg_age_inactive_file: 318208
Node 0, zone Normal
nr_inactive_file 4677
nr_vmscan_write 92536
avg_age_inactive_file: 3692
Inactive pages in the Normal zone are reclaimed in less than 4 seconds
(vs 5 minutes in the DMA32 zone), nr_vmscan_write is high and constantly
rising.
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: The pagecache unloved in zone NORMAL?
2013-05-05 21:50 ` Zlatko Calusic
2013-05-09 20:24 ` Zlatko Calusic
@ 2013-05-12 17:53 ` Rik van Riel
1 sibling, 0 replies; 4+ messages in thread
From: Rik van Riel @ 2013-05-12 17:53 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: Mel Gorman, linux-mm, Konstantin Khlebnikov
On 05/05/2013 05:50 PM, Zlatko Calusic wrote:
> An excellent Konstantin's patch better described here
> http://marc.info/?l=linux-mm&m=136731974301311 is already giving some
> useful additional insight into this problem, just as I expected. Here's
> the data after 31h of server uptime (also see the attached graph):
>
> Node 0, zone DMA32
> nr_inactive_file 443705
> avg_age_inactive_file: 362800
> Node 0, zone Normal
> nr_inactive_file 32832
> avg_age_inactive_file: 38760
>
> I reckon that only aging of the inactive LRU lists is of the interest at
> the moment, because there's currently a streaming I/O of about 8MB/s
> that can be seen on the graphs. Here's how I decipher the numbers:
> The only question I have is, is this a design mistake, or a plain bug?
I believe this is a bug.
> I strongly believe that pages should be reclaimed at speed appropriate
> to the LRU size.
I agree. Aging the pages in one zone 10x as fast as the pages in
another zone could throw off all kinds of things, including detecting
(and preserving) the system working set, page cache readahead thrashing,
etc...
--
All rights reversed
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-05-12 17:53 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-11 20:30 The pagecache unloved in zone NORMAL? Zlatko Calusic
2013-05-05 21:50 ` Zlatko Calusic
2013-05-09 20:24 ` Zlatko Calusic
2013-05-12 17:53 ` Rik van Riel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).