* [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2
@ 2011-07-21 16:28 Mel Gorman
2011-07-21 16:28 ` [PATCH 1/8] mm: vmscan: Do not writeback filesystem pages in direct reclaim Mel Gorman
` (10 more replies)
0 siblings, 11 replies; 43+ messages in thread
From: Mel Gorman @ 2011-07-21 16:28 UTC (permalink / raw)
To: Linux-MM
Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman
Warning: Long post with lots of figures. If you normally drink coffee
and you don't have a cup, get one or you may end up with a case of
keyboard face.
Changelog since v1
o Drop prio-inode patch. There is now a dependency that the flusher
threads find these dirty pages quickly.
o Drop nr_vmscan_throttled counter
o SetPageReclaim instead of deactivate_page which was wrong
o Add warning to main filesystems if called from direct reclaim context
o Add patch to completely disable filesystem writeback from reclaim
Testing from the XFS folk revealed that there is still too much
I/O from the end of the LRU in kswapd. Previously it was considered
acceptable by VM people for a small number of pages to be written
back from reclaim with testing generally showing about 0.3% of pages
reclaimed were written back (higher if memory was low). That writing
back a small number of pages is ok has been heavily disputed for
quite some time and Dave Chinner explained it well;
It doesn't have to be a very high number to be a problem. IO
is orders of magnitude slower than the CPU time it takes to
flush a page, so the cost of making a bad flush decision is
very high. And single page writeback from the LRU is almost
always a bad flush decision.
To complicate matters, filesystems respond very differently to requests
from reclaim according to Christoph Hellwig;
xfs tries to write it back if the requester is kswapd
ext4 ignores the request if it's a delayed allocation
btrfs ignores the request
As a result, each filesystem has different performance characteristics
when under memory pressure and there are many pages being dirties. In
some cases, the request is ignored entirely so the VM cannot depend
on the IO being dispatched.
The objective of this series to to reduce writing of filesystem-backed
pages from reclaim, play nicely with writeback that is already in
progress and throttle reclaim appropriately when dirty pages are
encountered. The assumption is that the flushers will always write
pages faster than if reclaim issues the IO. The new problem is that
reclaim has very little control over how long before a page in a
particular zone or container is cleaned which is discussed later. A
secondary goal is to avoid the problem whereby direct reclaim splices
two potentially deep call stacks together.
Patch 1 disables writeback of filesystem pages from direct reclaim
entirely. Anonymous pages are still written.
Patches 2-4 add warnings to XFS, ext4 and btrfs if called from
direct reclaim. With patch 1, this "never happens" and
is intended to catch regressions in this logic in the
future.
Patch 5 disables writeback of filesystem pages from kswapd unless
the priority is raised to the point where kswapd is considered
to be in trouble.
Patch 6 throttles reclaimers if too many dirty pages are being
encountered and the zones or backing devices are congested.
Patch 7 invalidates dirty pages found at the end of the LRU so they
are reclaimed quickly after being written back rather than
waiting for a reclaimer to find them
Patch 8 disables writeback of filesystem pages from kswapd and
depends entirely on the flusher threads for cleaning pages.
This is potentially a problem if the flusher threads take a
long time to wake or are not discovering the pages we need
cleaned. By placing the patch last, it's more likely that
bisection can catch if this situation occurs and can be
easily reverted.
I consider this series to be orthogonal to the writeback work but
it is worth noting that the writeback work affects the viability of
patch 8 in particular.
I tested this on ext4 and xfs using fs_mark and a micro benchmark
that does a streaming write to a large mapping (exercises use-once
LRU logic) followed by streaming writes to a mix of anonymous and
file-backed mappings. The command line for fs_mark when botted with
512M looked something like
./fs_mark -d /tmp/fsmark-2676 -D 100 -N 150 -n 150 -L 25 -t 1 -S0 -s 10485760
The number of files was adjusted depending on the amount of available
memory so that the files created was about 3xRAM. For multiple threads,
the -d switch is specified multiple times.
3 kernels are tested.
vanilla 3.0-rc6
kswapdwb-v2r5 patches 1-7
nokswapdwb-v2r5 patches 1-8
The test machine is x86-64 with an older generation of AMD processor
with 4 cores. The underlying storage was 4 disks configured as RAID-0
as this was the best configuration of storage I had available. Swap
is on a separate disk. Dirty ratio was tuned to 40% instead of the
default of 20%.
Testing was run with and without monitors to both verify that the
patches were operating as expected and that any performance gain was
real and not due to interference from monitors.
I've posted the raw reports for each filesystem at
http://www.csn.ul.ie/~mel/postings/reclaim-20110721
Unfortunately, the volume of data is excessive but here is a partial
summary of what was interesting for XFS.
512M1P-xfs Files/s mean 32.99 ( 0.00%) 35.16 ( 6.18%) 35.08 ( 5.94%)
512M1P-xfs Elapsed Time fsmark 122.54 115.54 115.21
512M1P-xfs Elapsed Time mmap-strm 105.09 104.44 106.12
512M-xfs Files/s mean 30.50 ( 0.00%) 33.30 ( 8.40%) 34.68 (12.06%)
512M-xfs Elapsed Time fsmark 136.14 124.26 120.33
512M-xfs Elapsed Time mmap-strm 154.68 145.91 138.83
512M-2X-xfs Files/s mean 28.48 ( 0.00%) 32.90 (13.45%) 32.83 (13.26%)
512M-2X-xfs Elapsed Time fsmark 145.64 128.67 128.67
512M-2X-xfs Elapsed Time mmap-strm 145.92 136.65 137.67
512M-4X-xfs Files/s mean 29.06 ( 0.00%) 32.82 (11.46%) 33.32 (12.81%)
512M-4X-xfs Elapsed Time fsmark 153.69 136.74 135.11
512M-4X-xfs Elapsed Time mmap-strm 159.47 128.64 132.59
512M-16X-xfs Files/s mean 48.80 ( 0.00%) 41.80 (-16.77%) 56.61 (13.79%)
512M-16X-xfs Elapsed Time fsmark 161.48 144.61 141.19
512M-16X-xfs Elapsed Time mmap-strm 167.04 150.62 147.83
The difference between kswapd writing and not writing for fsmark
in many cases is marginal simply because kswapd was not reaching a
high enough priority to enter writeback. Memory is mostly consumed
by filesystem-backed pages so limiting the number of dirty pages
(dirty_ratio == 40) means that kswapd always makes forward progress
and avoids the OOM killer.
For the streaming-write benchmark, it does make a small difference as
kswapd is reaching the higher priorities there due to a large number
of anonymous pages added to the mix. The performance difference is
marginal though as the number of filesystem pages written is about
1/50th of the number of anonymous pages written so it is drowned out.
I was initially worried about 512M-16X-xfs but it's well within the noise
looking at the standard deviations from
http://www.csn.ul.ie/~mel/postings/reclaim-20110721/html-no-monitor/global-dhp-512M-16X__writeback-reclaimdirty-xfs/hydra/comparison.html
Files/s min 25.00 ( 0.00%) 31.10 (19.61%) 32.00 (21.88%)
Files/s mean 48.80 ( 0.00%) 41.80 (-16.77%) 56.61 (13.79%)
Files/s stddev 28.65 ( 0.00%) 11.32 (-153.19%) 32.79 (12.62%)
Files/s max 133.20 ( 0.00%) 81.60 (-63.24%) 154.00 (13.51%)
64 threads writing on a machine with 4 CPUs with 512M RAM has variable
performance which is hardly surprising.
The streaming-write benchmarks all completed faster.
The tests were also run with mem=1024M and mem=4608M with the relative
performance improvement reduced as memory increases reflecting that
with enough memory there are fewer writes from reclaim as the flusher
threads have time to clean the page before it reaches the end of
the LRU.
Here is the same tests except when using ext4
512M1P-ext4 Files/s mean 37.36 ( 0.00%) 37.10 (-0.71%) 37.66 ( 0.78%)
512M1P-ext4 Elapsed Time fsmark 108.93 109.91 108.61
512M1P-ext4 Elapsed Time mmap-strm 112.15 108.93 109.10
512M-ext4 Files/s mean 30.83 ( 0.00%) 39.80 (22.54%) 32.74 ( 5.83%)
512M-ext4 Elapsed Time fsmark 368.07 322.55 328.80
512M-ext4 Elapsed Time mmap-strm 131.98 117.01 118.94
512M-2X-ext4 Files/s mean 20.27 ( 0.00%) 22.75 (10.88%) 20.80 ( 2.52%)
512M-2X-ext4 Elapsed Time fsmark 518.06 493.74 479.21
512M-2X-ext4 Elapsed Time mmap-strm 131.32 126.64 117.05
512M-4X-ext4 Files/s mean 17.91 ( 0.00%) 12.30 (-45.63%) 16.58 (-8.06%)
512M-4X-ext4 Elapsed Time fsmark 633.41 660.70 572.74
512M-4X-ext4 Elapsed Time mmap-strm 137.85 127.63 124.07
512M-16X-ext4 Files/s mean 55.86 ( 0.00%) 69.90 (20.09%) 42.66 (-30.94%)
512M-16X-ext4 Elapsed Time fsmark 543.21 544.43 586.16
512M-16X-ext4 Elapsed Time mmap-strm 141.84 146.12 144.01
At first glance, the benefit for ext4 is less clear cut but this
is due to the standard deviation being very high. Take 512M-4X-ext4
showing a 45.63% regression for example and we see.
Files/s min 5.40 ( 0.00%) 4.10 (-31.71%) 6.50 (16.92%)
Files/s mean 17.91 ( 0.00%) 12.30 (-45.63%) 16.58 (-8.06%)
Files/s stddev 14.34 ( 0.00%) 8.04 (-78.46%) 14.50 ( 1.04%)
Files/s max 54.30 ( 0.00%) 37.70 (-44.03%) 77.20 (29.66%)
The standard deviation is *massive* meaning that the performance
loss is well within the noise. The main positive out of this is the
streaming write benchmarks are generally better.
Where it does benefit is stalls in direct reclaim. Unlike xfs, ext4
can stall direct reclaim writing back pages. When I look at a separate
run using ftrace to gather more information, I see;
512M-ext4 Time stalled direct reclaim fsmark 0.36 0.30 0.31
512M-ext4 Time stalled direct reclaim mmap-strm 36.88 7.48 36.24
512M-4X-ext4 Time stalled direct reclaim fsmark 1.06 0.40 0.43
512M-4X-ext4 Time stalled direct reclaim mmap-strm 102.68 33.18 23.99
512M-16X-ext4 Time stalled direct reclaim fsmark 0.17 0.27 0.30
512M-16X-ext4 Time stalled direct reclaim mmap-strm 9.80 2.62 1.28
512M-32X-ext4 Time stalled direct reclaim fsmark 0.00 0.00 0.00
512M-32X-ext4 Time stalled direct reclaim mmap-strm 2.27 0.51 1.26
Time spent in direct reclaim is reduced implying that bug reports
complaining about the system becoming jittery when copying large
files may also be hel.
To show what effect the patches are having, this is a more detailed
look at one of the tests running with monitoring enabled. It's booted
with mem=512M and the number of threads running is equal to the number
of CPU cores. The backing filesystem is XFS.
FS-Mark
fsmark-3.0.0 3.0.0-rc6 3.0.0-rc6
rc6-vanilla kswapwb-v2r5 nokswapwb-v2r5
Files/s min 27.30 ( 0.00%) 31.80 (14.15%) 31.40 (13.06%)
Files/s mean 30.32 ( 0.00%) 34.34 (11.73%) 34.52 (12.18%)
Files/s stddev 1.39 ( 0.00%) 1.06 (-31.96%) 1.20 (-16.05%)
Files/s max 33.60 ( 0.00%) 36.00 ( 6.67%) 36.30 ( 7.44%)
Overhead min 1393832.00 ( 0.00%) 1793141.00 (-22.27%) 1133240.00 (23.00%)
Overhead mean 2423808.52 ( 0.00%) 2513297.40 (-3.56%) 1823398.44 (32.93%)
Overhead stddev 445880.26 ( 0.00%) 392952.66 (13.47%) 420498.38 ( 6.04%)
Overhead max 3359477.00 ( 0.00%) 3184889.00 ( 5.48%) 3016170.00 (11.38%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 53.26 52.27 51.88
Total Elapsed Time (seconds) 137.65 121.95 121.11
Average files per second is increased by a nice percentage that is
outside the noise. This is also true when I look at the results
without monitoring although the relative performance gain is less.
Time to completion is reduced which is always good ane as it implies
that IO was consistently higher and this is clearly visible at
http://www.csn.ul.ie/~mel/postings/reclaim-20110721/html-run-monitor/global-dhp-512M__writeback-reclaimdirty-xfs/hydra/blockio-comparison-hydra.png
http://www.csn.ul.ie/~mel/postings/reclaim-20110721/html-run-monitor/global-dhp-512M__writeback-reclaimdirty-xfs/hydra/blockio-comparison-smooth-hydra.png
kswapd CPU usage is also interesting
http://www.csn.ul.ie/~mel/postings/reclaim-20110721/html-run-monitor/global-dhp-512M__writeback-reclaimdirty-xfs/hydra/kswapdcpu-comparison-smooth-hydra.png
Note how preventing kswapd reclaiming dirty pages pushes up its CPU
usage as it scans more pages but it does not get excessive due to
the throttling.
MMTests Statistics: vmstat
Page Ins 1481672 1352900 1105364
Page Outs 38397462 38337199 38366073
Swap Ins 351918 320883 258868
Swap Outs 132060 117715 123564
Direct pages scanned 886587 968087 784109
Kswapd pages scanned 18931089 18275983 18324613
Kswapd pages reclaimed 8878200 8768648 8885482
Direct pages reclaimed 883407 960496 781632
Kswapd efficiency 46% 47% 48%
Kswapd velocity 137530.614 149864.559 151305.532
Direct efficiency 99% 99% 99%
Direct velocity 6440.879 7938.393 6474.354
Percentage direct scans 4% 5% 4%
Page writes by reclaim 170014 117717 123510
Page reclaim invalidate 0 1221396 1212857
Page reclaim throttled 0 0 0
Slabs scanned 23424 23680 23552
Direct inode steals 0 0 0
Kswapd inode steals 5560 5500 5584
Kswapd skipped wait 20 3 5
Compaction stalls 0 0 0
Compaction success 0 0 0
Compaction failures 0 0 0
Compaction pages moved 0 0 0
Compaction move failure 0 0 0
These stats are based on information from /proc/vmstat
"Kswapd efficiency" is the percentage of pages reclaimed to pages
scanned. The higher the percentage is the better because a low
percentage implies that kswapd is scanning uselessly. As the workload
dirties memory heavily and is a small machine, the efficiency is low at
46% and marginally improves due to a reduced number of pages scanned.
As memory increases, so does the efficiency as one might expect as
the flushers have a chance to clean the pages in time.
"Kswapd velocity" is the average number of pages scanned per
second. The patches increase this as it's no longer getting blocked on
page writes so it's expected but in general a higher velocity means
that kswapd is doing more work and consuming more CPU. In this case,
it is offset by the fact that fewer pages overall are scanned and
the test completes faster but it explains why CPU usage is higher.
Page writes by reclaim is what is motivating this series. It goes
from 170014 pages to 123510 which is a big improvement and we'll see
later that these writes are for anonymous pages.
"Page reclaim invalided" is very high and implies that a large number
of dirty pages are reaching the end of the list quickly. Unfortunately,
this is somewhat unavoidable. Kswapd is scanning pages at a rate
of roughly 125000 (or 488M) a second on a 512M machine. The best
possible writing rate of the underlying storage is about 300M/second.
With the rate of reclaim exceeding the best possible writing speed,
the system is going to get throttled.
FTrace Reclaim Statistics: vmscan
fsmark-3.0.0 3.0.0-rc6 3.0.0-rc6
rc6-vanilla kswapwb-v2r5 nokswapwb-v2r5
Direct reclaims 16173 17605 14313
Direct reclaim pages scanned 886587 968087 784109
Direct reclaim pages reclaimed 883407 960496 781632
Direct reclaim write file async I/O 0 0 0
Direct reclaim write anon async I/O 0 0 0
Direct reclaim write file sync I/O 0 0 0
Direct reclaim write anon sync I/O 0 0 0
Wake kswapd requests 20699 22048 22893
Kswapd wakeups 24 20 25
Kswapd pages scanned 18931089 18275983 18324613
Kswapd pages reclaimed 8878200 8768648 8885482
Kswapd reclaim write file async I/O 37966 0 0
Kswapd reclaim write anon async I/O 132062 117717 123567
Kswapd reclaim write file sync I/O 0 0 0
Kswapd reclaim write anon sync I/O 0 0 0
Time stalled direct reclaim (seconds) 0.08 0.09 0.08
Time kswapd awake (seconds) 132.11 117.78 115.82
Total pages scanned 19817676 19244070 19108722
Total pages reclaimed 9761607 9729144 9667114
%age total pages scanned/reclaimed 49.26% 50.56% 50.59%
%age total pages scanned/written 0.86% 0.61% 0.65%
%age file pages scanned/written 0.19% 0.00% 0.00%
Percentage Time Spent Direct Reclaim 0.15% 0.17% 0.15%
Percentage Time kswapd Awake 95.98% 96.58% 95.63%
Despite kswapd having higher CPU usage, it spent less time awake which
is probably a reflection of the test completing faster. File writes
from kswapd were 0 with the patches applied implying that kswapd was
not getting to a priority high enough to start writing. The remaining
writes correlate almost exactly to nr_vmscan_write implying that all
writes were for anonymous pages.
FTrace Reclaim Statistics: congestion_wait
Direct number congest waited 0 0 0
Direct time congest waited 0ms 0ms 0ms
Direct full congest waited 0 0 0
Direct number conditional waited 2 17 6
Direct time conditional waited 0ms 0ms 0ms
Direct full conditional waited 0 0 0
KSwapd number congest waited 4 8 10
KSwapd time congest waited 4ms 20ms 8ms
KSwapd full congest waited 0 0 0
KSwapd number conditional waited 0 26036 26283
KSwapd time conditional waited 0ms 16ms 4ms
KSwapd full conditional waited 0 0 0
This is based on some of the writeback tracepoints. It's interesting
to note that while kswapd got throttled about 26000 times with all
patches applied, it spent negligible time asleep so probably just
called cond_resched(). This implies that neither the zone nor the
backing device are rarely truly congested and throttling is necessary
simply to allow the pages to be written.
MICRO
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 32.57 31.18 30.52
Total Elapsed Time (seconds) 166.29 141.94 148.23
This test is in two stages. The first writes only to a file. The second
writes to a mix of anonymous and file mappings. Time to completion
is improved and this is still true with monitoring disabled.
MMTests Statistics: vmstat
Page Ins 11018260 10668536 10792204
Page Outs 16632838 16468468 16449897
Swap Ins 296167 245878 256038
Swap Outs 221626 177922 179409
Direct pages scanned 4129424 5172015 3686598
Kswapd pages scanned 9152837 9000480 7909180
Kswapd pages reclaimed 3388122 3284663 3371737
Direct pages reclaimed 735425 765263 708713
Kswapd efficiency 37% 36% 42%
Kswapd velocity 55041.416 63410.455 53357.485
Direct efficiency 17% 14% 19%
Direct velocity 24832.666 36438.037 24870.795
Percentage direct scans 31% 36% 31%
Page writes by reclaim 347283 180065 179425
Page writes skipped 0 0 0
Page reclaim invalidate 0 864018 554666
Write invalidated 0 0 0
Page reclaim throttled 0 0 0
Slabs scanned 14464 13696 13952
Direct inode steals 470 864 934
Kswapd inode steals 426 411 317
Kswapd skipped wait 3255 3381 1437
Compaction stalls 0 0 2
Compaction success 0 0 1
Compaction failures 0 0 1
Compaction pages moved 0 0 0
Compaction move failure 0 0 0
Kswapd efficiency is improved slightly. kswapd is operating at roughly
the same velocity but the number of pages scanned is far lower due
to the test completing faster.
Direct reclaim efficiency is improved slightly and scanning fewer pages
(again due to lower time to completion).
Fewer pages are being written from reclaim.
FTrace Reclaim Statistics: vmscan
micro-3.0.0 3.0.0-rc6 3.0.0-rc6
rc6-vanilla kswapwb-v2r5 nokswapwb-v2r5
Direct reclaims 14060 15425 13726
Direct reclaim pages scanned 3596218 4621037 3613503
Direct reclaim pages reclaimed 735425 765263 708713
Direct reclaim write file async I/O 87264 0 0
Direct reclaim write anon async I/O 10030 9127 15028
Direct reclaim write file sync I/O 0 0 0
Direct reclaim write anon sync I/O 0 0 0
Wake kswapd requests 10424 10346 10786
Kswapd wakeups 22 22 14
Kswapd pages scanned 9041353 8889081 7895846
Kswapd pages reclaimed 3388122 3284663 3371737
Kswapd reclaim write file async I/O 7277 1710 0
Kswapd reclaim write anon async I/O 184205 159178 162367
Kswapd reclaim write file sync I/O 0 0 0
Kswapd reclaim write anon sync I/O 0 0 0
Time stalled direct reclaim (seconds) 54.29 5.67 14.29
Time kswapd awake (seconds) 151.62 129.83 135.98
Total pages scanned 12637571 13510118 11509349
Total pages reclaimed 4123547 4049926 4080450
%age total pages scanned/reclaimed 32.63% 29.98% 35.45%
%age total pages scanned/written 2.29% 1.26% 1.54%
%age file pages scanned/written 0.75% 0.01% 0.00%
Percentage Time Spent Direct Reclaim 62.50% 15.39% 31.89%
Percentage Time kswapd Awake 91.18% 91.47% 91.74%
Time spent in direct reclaim is massively reduced which is surprising
as this is XFS so it should not have been stalling in the writing
files anyway. It's possible that the anon writes are completing
faster so time spent swapping is reduced.
With patches 1-7, kswapd still writes some pages due to it reaching
higher priorities due to memory pressure but the number of pages it
writes is significantly reduced and a small percentage of those that
were written to swap. Patch 8 eliminates it entirely but the benefit is
not seen in the completion times as the number of writes is so small.
FTrace Reclaim Statistics: congestion_wait
Direct number congest waited 0 0 0
Direct time congest waited 0ms 0ms 0ms
Direct full congest waited 0 0 0
Direct number conditional waited 12345 37713 34841
Direct time conditional waited 12396ms 132ms 168ms
Direct full conditional waited 53 0 0
KSwapd number congest waited 4248 2957 2293
KSwapd time congest waited 15320ms 10312ms 13416ms
KSwapd full congest waited 31 1 21
KSwapd number conditional waited 0 15989 10410
KSwapd time conditional waited 0ms 0ms 0ms
KSwapd full conditional waited 0 0 0
Congestion is way down as direct reclaim conditional wait time is
reduced by about 12 seconds.
Overall, this looks good. Avoiding writes from kswapd improves
overall performance as expected and eliminating them entirely seems
to behave well.
Next I tested on a NUMA configuration of sorts. I don't have a real
NUMA machine so I booted the same machine with mem=4096M numa=fake=8
so each node is 512M. Again, the volume of information is high but
here is a summary of sorts based on a test run with monitors enabled.
4096M8N-xfs Files/s mean 27.29 ( 0.00%) 27.35 ( 0.20%) 27.91 ( 2.22%)
4096M8N-xfs Elapsed Time fsmark 1402.55 1400.77 1382.92
4096M8N-xfs Elapsed Time mmap-strm 660.90 596.91 630.05
4096M8N-xfs Kswapd efficiency fsmark 72% 71% 13%
4096M8N-xfs Kswapd efficiency mmap-strm 39% 40% 31%
4096M8N-xfs stalled direct reclaim fsmark 0.00 0.00 0.00
4096M8N-xfs stalled direct reclaim mmap-strm 36.37 13.06 56.88
4096M8N-4X-xfs Files/s mean 26.80 ( 0.00%) 26.41 (-1.47%) 26.40 (-1.53%)
4096M8N-4X-xfs Elapsed Time fsmark 1453.95 1460.62 1470.98
4096M8N-4X-xfs Elapsed Time mmap-strm 683.34 663.46 690.01
4096M8N-4X-xfs Kswapd efficiency fsmark 68% 67% 8%
4096M8N-4X-xfs Kswapd efficiency mmap-strm 35% 34% 6%
4096M8N-4X-xfs stalled direct reclaim fsmark 0.00 0.00 0.00
4096M8N-4X-xfs stalled direct reclaim mmap-strm 26.45 87.57 46.87
4096M8N-2X-xfs Files/s mean 26.22 ( 0.00%) 26.70 ( 1.77%) 27.21 ( 3.62%)
4096M8N-2X-xfs Elapsed Time fsmark 1469.28 1439.30 1424.45
4096M8N-2X-xfs Elapsed Time mmap-strm 676.77 656.28 655.03
4096M8N-2X-xfs Kswapd efficiency fsmark 69% 69% 9%
4096M8N-2X-xfs Kswapd efficiency mmap-strm 33% 33% 7%
4096M8N-2X-xfs stalled direct reclaim fsmark 0.00 0.00 0.00
4096M8N-2X-xfs stalled direct reclaim mmap-strm 52.74 57.96 102.49
4096M8N-16X-xfs Files/s mean 25.78 ( 0.00%) 27.81 ( 7.32%) 48.52 (46.87%)
4096M8N-16X-xfs Elapsed Time fsmark 1555.95 1554.78 1542.53
4096M8N-16X-xfs Elapsed Time mmap-strm 770.01 763.62 844.55
4096M8N-16X-xfs Kswapd efficiency fsmark 62% 62% 7%
4096M8N-16X-xfs Kswapd efficiency mmap-strm 38% 37% 10%
4096M8N-16X-xfs stalled direct reclaim fsmark 0.12 0.01 0.05
4096M8N-16X-xfs stalled direct reclaim mmap-strm 1.07 1.09 63.32
The performance differences for fsmark are marginal because the number
of page written from reclaim is pretty low with this much memory even
with NUMA enabled. At no point did fsmark enter direct reclaim to
try and write a page so it's all kswapd. What is important to note is
the "Kswapd efficiency". Once kswapd cannot write pages at all, its
efficiency drops rapidly for fsmark as it scans about 5-8 times more
pages waiting on flusher threads to clean a page from the correct node.
Kswapd not writing pages impairs direct reclaim performance for the
streaming writer test. Note the times stalled in direct reclaim. In
all cases, the time stalled in direct reclaim goes way up as both
direct reclaimers and kswapd get stalled waiting on pages to get
cleaned from the right node.
Fortunately, kswapd CPU usage does not go to 100% because of the
throttling. From the 40968M test for example, I see
KSwapd full congest waited 834 739 989
KSwapd number conditional waited 0 68552 372275
KSwapd time conditional waited 0ms 16ms 1684ms
KSwapd full conditional waited 0 0 0
With kswapd avoiding writes, it gets throttled lightly but when it
writes no pasges at all, it gets throttled very heavily and sleeps.
ext4 tells a slightly different story
4096M8N-ext4 Files/s mean 28.63 ( 0.00%) 30.58 ( 6.37%) 31.04 ( 7.76%)
4096M8N-ext4 Elapsed Time fsmark 1578.51 1551.99 1532.65
4096M8N-ext4 Elapsed Time mmap-strm 703.66 655.25 654.86
4096M8N-ext4 Kswapd efficiency 62% 69% 68%
4096M8N-ext4 Kswapd efficiency 35% 35% 35%
4096M8N-ext4 stalled direct reclaim fsmark 0.00 0.00 0.00
4096M8N-ext4 stalled direct reclaim mmap-strm 32.64 95.72 152.62
4096M8N-2X-ext4 Files/s mean 30.74 ( 0.00%) 28.49 (-7.89%) 28.79 (-6.75%)
4096M8N-2X-ext4 Elapsed Time fsmark 1466.62 1583.12 1580.07
4096M8N-2X-ext4 Elapsed Time mmap-strm 705.17 705.64 693.01
4096M8N-2X-ext4 Kswapd efficiency 68% 68% 67%
4096M8N-2X-ext4 Kswapd efficiency 34% 30% 18%
4096M8N-2X-ext4 stalled direct reclaim fsmark 0.00 0.00 0.00
4096M8N-2X-ext4 stalled direct reclaim mmap-strm 106.82 24.88 27.88
4096M8N-4X-ext4 Files/s mean 24.15 ( 0.00%) 23.18 (-4.18%) 23.94 (-0.89%)
4096M8N-4X-ext4 Elapsed Time fsmark 1848.41 1971.48 1867.07
4096M8N-4X-ext4 Elapsed Time mmap-strm 664.87 673.66 674.46
4096M8N-4X-ext4 Kswapd efficiency 62% 65% 65%
4096M8N-4X-ext4 Kswapd efficiency 33% 37% 15%
4096M8N-4X-ext4 stalled direct reclaim fsmark 0.18 0.03 0.26
4096M8N-4X-ext4 stalled direct reclaim mmap-strm 115.71 23.05 61.12
4096M8N-16X-ext4 Files/s mean 5.42 ( 0.00%) 5.43 ( 0.15%) 3.83 (-41.44%)
4096M8N-16X-ext4 Elapsed Time fsmark 9572.85 9653.66 11245.41
4096M8N-16X-ext4 Elapsed Time mmap-strm 752.88 750.38 769.19
4096M8N-16X-ext4 Kswapd efficiency 59% 59% 61%
4096M8N-16X-ext4 Kswapd efficiency 34% 34% 21%
4096M8N-16X-ext4 stalled direct reclaim fsmark 0.26 0.65 0.26
4096M8N-16X-ext4 stalled direct reclaim mmap-strm 177.48 125.91 196.92
4096M8N-16X-ext4 with kswapd writing no pages collapsed in terms of
performance. Looking at the fsmark logs, in a number of iterations,
it was barely able to write files at all.
The apparent slowdown for fsmark in 4096M8N-2X-ext4 is well within
the noise but the reduced time spent in direct reclaim is very welcome.
Unlike xfs, it's less clear cut if direct reclaim performance is
impaired but in a few tests, preventing kswapd writing pages did
increase the time stalled.
Last test is that I've been running this series on my laptop since
Monday without any problem but it's rarely under serious memory
pressure. I see nr_vmscan_write is 0 and the number of pages
invalidated from the end of the LRU is only 10844 after 3 days so
it's not much of a test.
Overall, having kswapd avoiding writes does improve performance
which is not a surprise. Dave asked "do we even need IO at all from
reclaim?". On NUMA machines, the answer is "yes" unless the VM can
wake the flusher thread to clean a specific node. When kswapd never
writes, processes can stall for significant periods of time waiting on
flushers to clean the correct pages. If all writing is to be deferred
to flushers, it must ensure that many writes on one node would not
starve requests for cleaning pages on another node.
I'm currently of the opinion that we should consider merging patches
1-7 and discuss what is required before merging. It can be tackled
later how the flushers can prioritise writing of pages belonging to
a particular zone before disabling all writes from reclaim. There
is already some work in this general area with the possibility that
series such as "writeback: moving expire targets for background/kupdate
works" could be extended to allow patch 8 to be merged later even if
the series needs work.
fs/btrfs/disk-io.c | 2 ++
fs/btrfs/inode.c | 2 ++
fs/ext4/inode.c | 6 +++++-
fs/xfs/linux-2.6/xfs_aops.c | 9 +++++----
include/linux/mmzone.h | 1 +
mm/vmscan.c | 34 +++++++++++++++++++++++++++++++---
mm/vmstat.c | 1 +
7 files changed, 47 insertions(+), 8 deletions(-)
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 1/8] mm: vmscan: Do not writeback filesystem pages in direct reclaim
2011-07-21 16:28 [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2 Mel Gorman
@ 2011-07-21 16:28 ` Mel Gorman
2011-07-31 15:06 ` Minchan Kim
2011-07-21 16:28 ` [PATCH 2/8] xfs: Warn if direct reclaim tries to writeback pages Mel Gorman
` (9 subsequent siblings)
10 siblings, 1 reply; 43+ messages in thread
From: Mel Gorman @ 2011-07-21 16:28 UTC (permalink / raw)
To: Linux-MM
Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman
From: Mel Gorman <mel@csn.ul.ie>
When kswapd is failing to keep zones above the min watermark, a process
will enter direct reclaim in the same manner kswapd does. If a dirty
page is encountered during the scan, this page is written to backing
storage using mapping->writepage.
This causes two problems. First, it can result in very deep call
stacks, particularly if the target storage or filesystem are complex.
Some filesystems ignore write requests from direct reclaim as a result.
The second is that a single-page flush is inefficient in terms of IO.
While there is an expectation that the elevator will merge requests,
this does not always happen. Quoting Christoph Hellwig;
The elevator has a relatively small window it can operate on,
and can never fix up a bad large scale writeback pattern.
This patch prevents direct reclaim writing back filesystem pages by
checking if current is kswapd. Anonymous pages are still written to
swap as there is not the equivalent of a flusher thread for anonymous
pages. If the dirty pages cannot be written back, they are placed
back on the LRU lists. There is now a direct dependency on dirty page
balancing to prevent too many pages in the system being dirtied which
would prevent reclaim making forward progress.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/linux/mmzone.h | 1 +
mm/vmscan.c | 9 +++++++++
mm/vmstat.c | 1 +
3 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9f7c3eb..b70a0c0 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -100,6 +100,7 @@ enum zone_stat_item {
NR_UNSTABLE_NFS, /* NFS unstable pages */
NR_BOUNCE,
NR_VMSCAN_WRITE,
+ NR_VMSCAN_WRITE_SKIP,
NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */
NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5ed24b9..ee00c94 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -825,6 +825,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
if (PageDirty(page)) {
nr_dirty++;
+ /*
+ * Only kswapd can writeback filesystem pages to
+ * avoid risk of stack overflow
+ */
+ if (page_is_file_cache(page) && !current_is_kswapd()) {
+ inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
+ goto keep_locked;
+ }
+
if (references == PAGEREF_RECLAIM_CLEAN)
goto keep_locked;
if (!may_enter_fs)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 20c18b7..fd109f3 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -702,6 +702,7 @@ const char * const vmstat_text[] = {
"nr_unstable",
"nr_bounce",
"nr_vmscan_write",
+ "nr_vmscan_write_skip",
"nr_writeback_temp",
"nr_isolated_anon",
"nr_isolated_file",
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 2/8] xfs: Warn if direct reclaim tries to writeback pages
2011-07-21 16:28 [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2 Mel Gorman
2011-07-21 16:28 ` [PATCH 1/8] mm: vmscan: Do not writeback filesystem pages in direct reclaim Mel Gorman
@ 2011-07-21 16:28 ` Mel Gorman
2011-07-24 11:32 ` Christoph Hellwig
2011-07-21 16:28 ` [PATCH 3/8] ext4: " Mel Gorman
` (8 subsequent siblings)
10 siblings, 1 reply; 43+ messages in thread
From: Mel Gorman @ 2011-07-21 16:28 UTC (permalink / raw)
To: Linux-MM
Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman
Direct reclaim should never writeback pages. For now, handle the
situation and warn about it. Ultimately, this will be a BUG_ON.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
fs/xfs/linux-2.6/xfs_aops.c | 9 +++++----
1 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index 79ce38b..c33a439 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -930,12 +930,13 @@ xfs_vm_writepage(
* random callers for direct reclaim or memcg reclaim. We explicitly
* allow reclaim from kswapd as the stack usage there is relatively low.
*
- * This should really be done by the core VM, but until that happens
- * filesystems like XFS, btrfs and ext4 have to take care of this
- * by themselves.
+ * This should never happen except in the case of a VM regression so
+ * warn about it.
*/
- if ((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC)
+ if ((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC) {
+ WARN_ON_ONCE(1);
goto redirty;
+ }
/*
* We need a transaction if there are delalloc or unwritten buffers
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 3/8] ext4: Warn if direct reclaim tries to writeback pages
2011-07-21 16:28 [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2 Mel Gorman
2011-07-21 16:28 ` [PATCH 1/8] mm: vmscan: Do not writeback filesystem pages in direct reclaim Mel Gorman
2011-07-21 16:28 ` [PATCH 2/8] xfs: Warn if direct reclaim tries to writeback pages Mel Gorman
@ 2011-07-21 16:28 ` Mel Gorman
2011-08-03 10:58 ` Johannes Weiner
2011-07-21 16:28 ` [PATCH 4/8] btrfs: " Mel Gorman
` (7 subsequent siblings)
10 siblings, 1 reply; 43+ messages in thread
From: Mel Gorman @ 2011-07-21 16:28 UTC (permalink / raw)
To: Linux-MM
Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman
Direct reclaim should never writeback pages. Warn if an attempt
is made.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
fs/ext4/inode.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e3126c0..95bb179 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2663,8 +2663,12 @@ static int ext4_writepage(struct page *page,
* We don't want to do block allocation, so redirty
* the page and return. We may reach here when we do
* a journal commit via journal_submit_inode_data_buffers.
- * We can also reach here via shrink_page_list
+ * We can also reach here via shrink_page_list but it
+ * should never be for direct reclaim so warn if that
+ * happens
*/
+ WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
+ PF_MEMALLOC);
goto redirty_page;
}
if (commit_write)
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 4/8] btrfs: Warn if direct reclaim tries to writeback pages
2011-07-21 16:28 [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2 Mel Gorman
` (2 preceding siblings ...)
2011-07-21 16:28 ` [PATCH 3/8] ext4: " Mel Gorman
@ 2011-07-21 16:28 ` Mel Gorman
2011-08-03 11:10 ` Johannes Weiner
2011-07-21 16:28 ` [PATCH 5/8] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority Mel Gorman
` (6 subsequent siblings)
10 siblings, 1 reply; 43+ messages in thread
From: Mel Gorman @ 2011-07-21 16:28 UTC (permalink / raw)
To: Linux-MM
Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman
Direct reclaim should never writeback pages. Warn if an attempt is
made. By rights, btrfs should be allowing writepage from kswapd if
it is failing to reclaim pages by any other means but it's outside
the scope of this patch.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
fs/btrfs/disk-io.c | 2 ++
fs/btrfs/inode.c | 2 ++
2 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1ac8db5d..cc9c9cf 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -829,6 +829,8 @@ static int btree_writepage(struct page *page, struct writeback_control *wbc)
tree = &BTRFS_I(page->mapping->host)->io_tree;
if (!(current->flags & PF_MEMALLOC)) {
+ WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
+ PF_MEMALLOC);
return extent_write_full_page(tree, page,
btree_get_extent, wbc);
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3601f0a..07d6c27 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6259,6 +6259,8 @@ static int btrfs_writepage(struct page *page, struct writeback_control *wbc)
if (current->flags & PF_MEMALLOC) {
+ WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
+ PF_MEMALLOC);
redirty_page_for_writepage(wbc, page);
unlock_page(page);
return 0;
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 5/8] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
2011-07-21 16:28 [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2 Mel Gorman
` (3 preceding siblings ...)
2011-07-21 16:28 ` [PATCH 4/8] btrfs: " Mel Gorman
@ 2011-07-21 16:28 ` Mel Gorman
2011-07-31 15:11 ` Minchan Kim
2011-07-21 16:28 ` [PATCH 6/8] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback Mel Gorman
` (5 subsequent siblings)
10 siblings, 1 reply; 43+ messages in thread
From: Mel Gorman @ 2011-07-21 16:28 UTC (permalink / raw)
To: Linux-MM
Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman
It is preferable that no dirty pages are dispatched for cleaning from
the page reclaim path. At normal priorities, this patch prevents kswapd
writing pages.
However, page reclaim does have a requirement that pages be freed
in a particular zone. If it is failing to make sufficient progress
(reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
considered to tbe the point where kswapd is getting into trouble
reclaiming pages. If this priority is reached, kswapd will dispatch
pages for writing.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/vmscan.c | 13 ++++++++-----
1 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ee00c94..cf7b501 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -719,7 +719,8 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
*/
static unsigned long shrink_page_list(struct list_head *page_list,
struct zone *zone,
- struct scan_control *sc)
+ struct scan_control *sc,
+ int priority)
{
LIST_HEAD(ret_pages);
LIST_HEAD(free_pages);
@@ -827,9 +828,11 @@ static unsigned long shrink_page_list(struct list_head *page_list,
/*
* Only kswapd can writeback filesystem pages to
- * avoid risk of stack overflow
+ * avoid risk of stack overflow but do not writeback
+ * unless under significant pressure.
*/
- if (page_is_file_cache(page) && !current_is_kswapd()) {
+ if (page_is_file_cache(page) &&
+ (!current_is_kswapd() || priority >= DEF_PRIORITY - 2)) {
inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
goto keep_locked;
}
@@ -1465,12 +1468,12 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
spin_unlock_irq(&zone->lru_lock);
- nr_reclaimed = shrink_page_list(&page_list, zone, sc);
+ nr_reclaimed = shrink_page_list(&page_list, zone, sc, priority);
/* Check if we should syncronously wait for writeback */
if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
set_reclaim_mode(priority, sc, true);
- nr_reclaimed += shrink_page_list(&page_list, zone, sc);
+ nr_reclaimed += shrink_page_list(&page_list, zone, sc, priority);
}
local_irq_disable();
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 6/8] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
2011-07-21 16:28 [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2 Mel Gorman
` (4 preceding siblings ...)
2011-07-21 16:28 ` [PATCH 5/8] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority Mel Gorman
@ 2011-07-21 16:28 ` Mel Gorman
2011-07-31 15:17 ` Minchan Kim
2011-08-03 11:19 ` Johannes Weiner
2011-07-21 16:28 ` [PATCH 7/8] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes Mel Gorman
` (4 subsequent siblings)
10 siblings, 2 replies; 43+ messages in thread
From: Mel Gorman @ 2011-07-21 16:28 UTC (permalink / raw)
To: Linux-MM
Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman
Workloads that are allocating frequently and writing files place a
large number of dirty pages on the LRU. With use-once logic, it is
possible for them to reach the end of the LRU quickly requiring the
reclaimer to scan more to find clean pages. Ordinarily, processes that
are dirtying memory will get throttled by dirty balancing but this
is a global heuristic and does not take into account that LRUs are
maintained on a per-zone basis. This can lead to a situation whereby
reclaim is scanning heavily, skipping over a large number of pages
under writeback and recycling them around the LRU consuming CPU.
This patch checks how many of the number of pages isolated from the
LRU were dirty. If a percentage of them are dirty, the process will be
throttled if a blocking device is congested or the zone being scanned
is marked congested. The percentage that must be dirty depends on
the priority. At default priority, all of them must be dirty. At
DEF_PRIORITY-1, 50% of them must be dirty, DEF_PRIORITY-2, 25%
etc. i.e. as pressure increases the greater the likelihood the process
will get throttled to allow the flusher threads to make some progress.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/vmscan.c | 21 ++++++++++++++++++---
1 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index cf7b501..b0060f8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -720,7 +720,8 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
static unsigned long shrink_page_list(struct list_head *page_list,
struct zone *zone,
struct scan_control *sc,
- int priority)
+ int priority,
+ unsigned long *ret_nr_dirty)
{
LIST_HEAD(ret_pages);
LIST_HEAD(free_pages);
@@ -971,6 +972,7 @@ keep_lumpy:
list_splice(&ret_pages, page_list);
count_vm_events(PGACTIVATE, pgactivate);
+ *ret_nr_dirty += nr_dirty;
return nr_reclaimed;
}
@@ -1420,6 +1422,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
unsigned long nr_taken;
unsigned long nr_anon;
unsigned long nr_file;
+ unsigned long nr_dirty = 0;
while (unlikely(too_many_isolated(zone, file, sc))) {
congestion_wait(BLK_RW_ASYNC, HZ/10);
@@ -1468,12 +1471,14 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
spin_unlock_irq(&zone->lru_lock);
- nr_reclaimed = shrink_page_list(&page_list, zone, sc, priority);
+ nr_reclaimed = shrink_page_list(&page_list, zone, sc,
+ priority, &nr_dirty);
/* Check if we should syncronously wait for writeback */
if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
set_reclaim_mode(priority, sc, true);
- nr_reclaimed += shrink_page_list(&page_list, zone, sc, priority);
+ nr_reclaimed += shrink_page_list(&page_list, zone, sc,
+ priority, &nr_dirty);
}
local_irq_disable();
@@ -1483,6 +1488,16 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
putback_lru_pages(zone, sc, nr_anon, nr_file, &page_list);
+ /*
+ * If we have encountered a high number of dirty pages then they
+ * are reaching the end of the LRU too quickly and global limits are
+ * not enough to throttle processes due to the page distribution
+ * throughout zones. Scale the number of dirty pages that must be
+ * dirty before being throttled to priority.
+ */
+ if (nr_dirty && nr_dirty >= (nr_taken >> (DEF_PRIORITY-priority)))
+ wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
+
trace_mm_vmscan_lru_shrink_inactive(zone->zone_pgdat->node_id,
zone_idx(zone),
nr_scanned, nr_reclaimed,
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 7/8] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
2011-07-21 16:28 [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2 Mel Gorman
` (5 preceding siblings ...)
2011-07-21 16:28 ` [PATCH 6/8] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback Mel Gorman
@ 2011-07-21 16:28 ` Mel Gorman
2011-07-22 12:53 ` Peter Zijlstra
2011-08-03 11:26 ` Johannes Weiner
2011-07-21 16:28 ` [PATCH 8/8] mm: vmscan: Do not writeback filesystem pages from kswapd Mel Gorman
` (3 subsequent siblings)
10 siblings, 2 replies; 43+ messages in thread
From: Mel Gorman @ 2011-07-21 16:28 UTC (permalink / raw)
To: Linux-MM
Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman
When direct reclaim encounters a dirty page, it gets recycled around
the LRU for another cycle. This patch marks the page PageReclaim
similar to deactivate_page() so that the page gets reclaimed almost
immediately after the page gets cleaned. This is to avoid reclaiming
clean pages that are younger than a dirty page encountered at the
end of the LRU that might have been something like a use-once page.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/linux/mmzone.h | 2 +-
mm/vmscan.c | 10 +++++++++-
mm/vmstat.c | 2 +-
3 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b70a0c0..30d1dd1 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -100,7 +100,7 @@ enum zone_stat_item {
NR_UNSTABLE_NFS, /* NFS unstable pages */
NR_BOUNCE,
NR_VMSCAN_WRITE,
- NR_VMSCAN_WRITE_SKIP,
+ NR_VMSCAN_INVALIDATE,
NR_WRITEBACK_TEMP, /* Writeback using temporary buffers */
NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */
NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b0060f8..c3d8341 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -834,7 +834,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
*/
if (page_is_file_cache(page) &&
(!current_is_kswapd() || priority >= DEF_PRIORITY - 2)) {
- inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
+ /*
+ * Immediately reclaim when written back.
+ * Similar in principal to deactivate_page()
+ * except we already have the page isolated
+ * and know it's dirty
+ */
+ inc_zone_page_state(page, NR_VMSCAN_INVALIDATE);
+ SetPageReclaim(page);
+
goto keep_locked;
}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index fd109f3..5bd2043 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -702,7 +702,7 @@ const char * const vmstat_text[] = {
"nr_unstable",
"nr_bounce",
"nr_vmscan_write",
- "nr_vmscan_write_skip",
+ "nr_vmscan_invalidate",
"nr_writeback_temp",
"nr_isolated_anon",
"nr_isolated_file",
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 8/8] mm: vmscan: Do not writeback filesystem pages from kswapd
2011-07-21 16:28 [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2 Mel Gorman
` (6 preceding siblings ...)
2011-07-21 16:28 ` [PATCH 7/8] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes Mel Gorman
@ 2011-07-21 16:28 ` Mel Gorman
2011-07-22 12:57 ` Peter Zijlstra
2011-08-03 11:37 ` Johannes Weiner
2011-07-26 11:20 ` [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2 Dave Chinner
` (2 subsequent siblings)
10 siblings, 2 replies; 43+ messages in thread
From: Mel Gorman @ 2011-07-21 16:28 UTC (permalink / raw)
To: Linux-MM
Cc: LKML, XFS, Dave Chinner, Christoph Hellwig, Johannes Weiner,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, Mel Gorman
Assuming that flusher threads will always write back dirty pages promptly
then it is always faster for reclaimers to wait for flushers. This patch
prevents kswapd writing back any filesystem pages.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
mm/vmscan.c | 15 ++++-----------
1 files changed, 4 insertions(+), 11 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c3d8341..6023494 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -720,7 +720,6 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
static unsigned long shrink_page_list(struct list_head *page_list,
struct zone *zone,
struct scan_control *sc,
- int priority,
unsigned long *ret_nr_dirty)
{
LIST_HEAD(ret_pages);
@@ -827,13 +826,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
if (PageDirty(page)) {
nr_dirty++;
- /*
- * Only kswapd can writeback filesystem pages to
- * avoid risk of stack overflow but do not writeback
- * unless under significant pressure.
- */
- if (page_is_file_cache(page) &&
- (!current_is_kswapd() || priority >= DEF_PRIORITY - 2)) {
+ /* Flusher must clean dirty filesystem-backed pages */
+ if (page_is_file_cache(page)) {
/*
* Immediately reclaim when written back.
* Similar in principal to deactivate_page()
@@ -1479,14 +1473,13 @@ shrink_inactive_list(unsigned long nr_to_scan, struct zone *zone,
spin_unlock_irq(&zone->lru_lock);
- nr_reclaimed = shrink_page_list(&page_list, zone, sc,
- priority, &nr_dirty);
+ nr_reclaimed = shrink_page_list(&page_list, zone, sc, &nr_dirty);
/* Check if we should syncronously wait for writeback */
if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
set_reclaim_mode(priority, sc, true);
nr_reclaimed += shrink_page_list(&page_list, zone, sc,
- priority, &nr_dirty);
+ &nr_dirty);
}
local_irq_disable();
--
1.7.3.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 7/8] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
2011-07-21 16:28 ` [PATCH 7/8] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes Mel Gorman
@ 2011-07-22 12:53 ` Peter Zijlstra
2011-07-22 13:23 ` Mel Gorman
2011-08-03 11:26 ` Johannes Weiner
1 sibling, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2011-07-22 12:53 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
Minchan Kim
On Thu, 2011-07-21 at 17:28 +0100, Mel Gorman wrote:
> When direct reclaim encounters a dirty page, it gets recycled around
> the LRU for another cycle. This patch marks the page PageReclaim
> similar to deactivate_page() so that the page gets reclaimed almost
> immediately after the page gets cleaned. This is to avoid reclaiming
> clean pages that are younger than a dirty page encountered at the
> end of the LRU that might have been something like a use-once page.
>
> @@ -834,7 +834,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> */
> if (page_is_file_cache(page) &&
> (!current_is_kswapd() || priority >= DEF_PRIORITY - 2)) {
> - inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
> + /*
> + * Immediately reclaim when written back.
> + * Similar in principal to deactivate_page()
> + * except we already have the page isolated
> + * and know it's dirty
> + */
> + inc_zone_page_state(page, NR_VMSCAN_INVALIDATE);
> + SetPageReclaim(page);
> +
I find the invalidate name somewhat confusing. It makes me think we'll
drop the page without writeback, like invalidatepage().
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 8/8] mm: vmscan: Do not writeback filesystem pages from kswapd
2011-07-21 16:28 ` [PATCH 8/8] mm: vmscan: Do not writeback filesystem pages from kswapd Mel Gorman
@ 2011-07-22 12:57 ` Peter Zijlstra
2011-07-22 13:31 ` Mel Gorman
2011-08-03 11:37 ` Johannes Weiner
1 sibling, 1 reply; 43+ messages in thread
From: Peter Zijlstra @ 2011-07-22 12:57 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
Minchan Kim
On Thu, 2011-07-21 at 17:28 +0100, Mel Gorman wrote:
> Assuming that flusher threads will always write back dirty pages promptly
> then it is always faster for reclaimers to wait for flushers. This patch
> prevents kswapd writing back any filesystem pages.
That is a somewhat sort changelog for such a big assumption ;-)
I think it can use a few extra words to explain the need to clean pages
from @zone vs writeback picks whatever fits best on disk and how that
works out wrt the assumption.
What requirements does this place on writeback and how does it meet
them.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/8] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
2011-07-22 12:53 ` Peter Zijlstra
@ 2011-07-22 13:23 ` Mel Gorman
2011-07-31 15:24 ` Minchan Kim
0 siblings, 1 reply; 43+ messages in thread
From: Mel Gorman @ 2011-07-22 13:23 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
Minchan Kim
On Fri, Jul 22, 2011 at 02:53:48PM +0200, Peter Zijlstra wrote:
> On Thu, 2011-07-21 at 17:28 +0100, Mel Gorman wrote:
> > When direct reclaim encounters a dirty page, it gets recycled around
> > the LRU for another cycle. This patch marks the page PageReclaim
> > similar to deactivate_page() so that the page gets reclaimed almost
> > immediately after the page gets cleaned. This is to avoid reclaiming
> > clean pages that are younger than a dirty page encountered at the
> > end of the LRU that might have been something like a use-once page.
> >
>
> > @@ -834,7 +834,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> > */
> > if (page_is_file_cache(page) &&
> > (!current_is_kswapd() || priority >= DEF_PRIORITY - 2)) {
> > - inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
> > + /*
> > + * Immediately reclaim when written back.
> > + * Similar in principal to deactivate_page()
> > + * except we already have the page isolated
> > + * and know it's dirty
> > + */
> > + inc_zone_page_state(page, NR_VMSCAN_INVALIDATE);
> > + SetPageReclaim(page);
> > +
>
> I find the invalidate name somewhat confusing. It makes me think we'll
> drop the page without writeback, like invalidatepage().
I wasn't that happy with it either to be honest but didn't think of a
better one at the time. nr_reclaim_deferred?
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 8/8] mm: vmscan: Do not writeback filesystem pages from kswapd
2011-07-22 12:57 ` Peter Zijlstra
@ 2011-07-22 13:31 ` Mel Gorman
0 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2011-07-22 13:31 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel,
Minchan Kim
On Fri, Jul 22, 2011 at 02:57:12PM +0200, Peter Zijlstra wrote:
> On Thu, 2011-07-21 at 17:28 +0100, Mel Gorman wrote:
> > Assuming that flusher threads will always write back dirty pages promptly
> > then it is always faster for reclaimers to wait for flushers. This patch
> > prevents kswapd writing back any filesystem pages.
>
> That is a somewhat sort changelog for such a big assumption ;-)
>
That is an understatement but the impact of the patch is discussed in
detail in the leader. On NUMA, this patch has a negative impact so
I put no effort into the changelog. The patch is part of the series
because it was specifically asked for.
> I think it can use a few extra words to explain the need to clean pages
> from @zone vs writeback picks whatever fits best on disk and how that
> works out wrt the assumption.
>
At the time of writing the changelog, I knew that flushers were
not finding pages from the correct zones quickly enough in the NUMA
usecase. The changelog documents the assumptions testing shows them to
be false.
> What requirements does this place on writeback and how does it meet
> them.
It places a requirement on writeback to prioritise pages from zones
under memory pressure. It doesn't meet them. I mention in the leader
that I think patch 8 should be dropped which is why the changelog
sucks.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/8] xfs: Warn if direct reclaim tries to writeback pages
2011-07-21 16:28 ` [PATCH 2/8] xfs: Warn if direct reclaim tries to writeback pages Mel Gorman
@ 2011-07-24 11:32 ` Christoph Hellwig
2011-07-25 8:19 ` Mel Gorman
0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2011-07-24 11:32 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, Rik van Riel, Jan Kara, LKML, XFS, Christoph Hellwig,
Minchan Kim, Wu Fengguang, Johannes Weiner
On Thu, Jul 21, 2011 at 05:28:44PM +0100, Mel Gorman wrote:
> --- a/fs/xfs/linux-2.6/xfs_aops.c
> +++ b/fs/xfs/linux-2.6/xfs_aops.c
> @@ -930,12 +930,13 @@ xfs_vm_writepage(
> * random callers for direct reclaim or memcg reclaim. We explicitly
> * allow reclaim from kswapd as the stack usage there is relatively low.
> *
> - * This should really be done by the core VM, but until that happens
> - * filesystems like XFS, btrfs and ext4 have to take care of this
> - * by themselves.
> + * This should never happen except in the case of a VM regression so
> + * warn about it.
> */
> - if ((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC)
> + if ((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC) {
> + WARN_ON_ONCE(1);
> goto redirty;
The nicer way to write this is
if (WARN_ON(current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC)
goto redirty;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 2/8] xfs: Warn if direct reclaim tries to writeback pages
2011-07-24 11:32 ` Christoph Hellwig
@ 2011-07-25 8:19 ` Mel Gorman
0 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2011-07-25 8:19 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Linux-MM, Rik van Riel, Jan Kara, LKML, XFS, Minchan Kim,
Wu Fengguang, Johannes Weiner
On Sun, Jul 24, 2011 at 07:32:00AM -0400, Christoph Hellwig wrote:
> On Thu, Jul 21, 2011 at 05:28:44PM +0100, Mel Gorman wrote:
> > --- a/fs/xfs/linux-2.6/xfs_aops.c
> > +++ b/fs/xfs/linux-2.6/xfs_aops.c
> > @@ -930,12 +930,13 @@ xfs_vm_writepage(
> > * random callers for direct reclaim or memcg reclaim. We explicitly
> > * allow reclaim from kswapd as the stack usage there is relatively low.
> > *
> > - * This should really be done by the core VM, but until that happens
> > - * filesystems like XFS, btrfs and ext4 have to take care of this
> > - * by themselves.
> > + * This should never happen except in the case of a VM regression so
> > + * warn about it.
> > */
> > - if ((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC)
> > + if ((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC) {
> > + WARN_ON_ONCE(1);
> > goto redirty;
>
> The nicer way to write this is
>
> if (WARN_ON(current->flags & (PF_MEMALLOC|PF_KSWAPD)) == PF_MEMALLOC)
> goto redirty;
>
I wanted to avoid side effects if WARN_ON was compiled out similar to
the care that is normally taken for BUG_ON but it's unnecessary and
your version is far tidier. Do you really want WARN_ON used instead
of WARN_ON_ONCE()?
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2
2011-07-21 16:28 [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2 Mel Gorman
` (7 preceding siblings ...)
2011-07-21 16:28 ` [PATCH 8/8] mm: vmscan: Do not writeback filesystem pages from kswapd Mel Gorman
@ 2011-07-26 11:20 ` Dave Chinner
2011-07-27 4:32 ` Minchan Kim
2011-07-27 16:18 ` Minchan Kim
10 siblings, 0 replies; 43+ messages in thread
From: Dave Chinner @ 2011-07-26 11:20 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Christoph Hellwig, Johannes Weiner,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim
On Thu, Jul 21, 2011 at 05:28:42PM +0100, Mel Gorman wrote:
> Warning: Long post with lots of figures. If you normally drink coffee
> and you don't have a cup, get one or you may end up with a case of
> keyboard face.
[snip]
> Overall, having kswapd avoiding writes does improve performance
> which is not a surprise. Dave asked "do we even need IO at all from
> reclaim?". On NUMA machines, the answer is "yes" unless the VM can
> wake the flusher thread to clean a specific node.
Great answer, Mel. ;)
> When kswapd never
> writes, processes can stall for significant periods of time waiting on
> flushers to clean the correct pages. If all writing is to be deferred
> to flushers, it must ensure that many writes on one node would not
> starve requests for cleaning pages on another node.
Ok, so that's a direction we need to work towards, then.
> I'm currently of the opinion that we should consider merging patches
> 1-7 and discuss what is required before merging. It can be tackled
> later how the flushers can prioritise writing of pages belonging to
> a particular zone before disabling all writes from reclaim.
Sounds reasonable to me.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2
2011-07-21 16:28 [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2 Mel Gorman
` (8 preceding siblings ...)
2011-07-26 11:20 ` [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2 Dave Chinner
@ 2011-07-27 4:32 ` Minchan Kim
2011-07-27 7:37 ` Mel Gorman
2011-07-27 16:18 ` Minchan Kim
10 siblings, 1 reply; 43+ messages in thread
From: Minchan Kim @ 2011-07-27 4:32 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel
Hi Mel,
On Fri, Jul 22, 2011 at 1:28 AM, Mel Gorman <mgorman@suse.de> wrote:
> Warning: Long post with lots of figures. If you normally drink coffee
> and you don't have a cup, get one or you may end up with a case of
> keyboard face.
>
> Changelog since v1
> o Drop prio-inode patch. There is now a dependency that the flusher
> threads find these dirty pages quickly.
> o Drop nr_vmscan_throttled counter
> o SetPageReclaim instead of deactivate_page which was wrong
> o Add warning to main filesystems if called from direct reclaim context
> o Add patch to completely disable filesystem writeback from reclaim
>
> Testing from the XFS folk revealed that there is still too much
> I/O from the end of the LRU in kswapd. Previously it was considered
> acceptable by VM people for a small number of pages to be written
> back from reclaim with testing generally showing about 0.3% of pages
> reclaimed were written back (higher if memory was low). That writing
> back a small number of pages is ok has been heavily disputed for
> quite some time and Dave Chinner explained it well;
>
> It doesn't have to be a very high number to be a problem. IO
> is orders of magnitude slower than the CPU time it takes to
> flush a page, so the cost of making a bad flush decision is
> very high. And single page writeback from the LRU is almost
> always a bad flush decision.
>
> To complicate matters, filesystems respond very differently to requests
> from reclaim according to Christoph Hellwig;
>
> xfs tries to write it back if the requester is kswapd
> ext4 ignores the request if it's a delayed allocation
> btrfs ignores the request
>
> As a result, each filesystem has different performance characteristics
> when under memory pressure and there are many pages being dirties. In
> some cases, the request is ignored entirely so the VM cannot depend
> on the IO being dispatched.
>
> The objective of this series to to reduce writing of filesystem-backed
> pages from reclaim, play nicely with writeback that is already in
> progress and throttle reclaim appropriately when dirty pages are
> encountered. The assumption is that the flushers will always write
> pages faster than if reclaim issues the IO. The new problem is that
> reclaim has very little control over how long before a page in a
> particular zone or container is cleaned which is discussed later. A
> secondary goal is to avoid the problem whereby direct reclaim splices
> two potentially deep call stacks together.
>
> Patch 1 disables writeback of filesystem pages from direct reclaim
> entirely. Anonymous pages are still written.
>
> Patches 2-4 add warnings to XFS, ext4 and btrfs if called from
> direct reclaim. With patch 1, this "never happens" and
> is intended to catch regressions in this logic in the
> future.
>
> Patch 5 disables writeback of filesystem pages from kswapd unless
> the priority is raised to the point where kswapd is considered
> to be in trouble.
>
> Patch 6 throttles reclaimers if too many dirty pages are being
> encountered and the zones or backing devices are congested.
>
> Patch 7 invalidates dirty pages found at the end of the LRU so they
> are reclaimed quickly after being written back rather than
> waiting for a reclaimer to find them
>
> Patch 8 disables writeback of filesystem pages from kswapd and
> depends entirely on the flusher threads for cleaning pages.
> This is potentially a problem if the flusher threads take a
> long time to wake or are not discovering the pages we need
> cleaned. By placing the patch last, it's more likely that
> bisection can catch if this situation occurs and can be
> easily reverted.
>
> I consider this series to be orthogonal to the writeback work but
> it is worth noting that the writeback work affects the viability of
> patch 8 in particular.
>
> I tested this on ext4 and xfs using fs_mark and a micro benchmark
> that does a streaming write to a large mapping (exercises use-once
> LRU logic) followed by streaming writes to a mix of anonymous and
> file-backed mappings. The command line for fs_mark when botted with
> 512M looked something like
>
> ./fs_mark -d /tmp/fsmark-2676 -D 100 -N 150 -n 150 -L 25 -t 1 -S0 -s 10485760
>
> The number of files was adjusted depending on the amount of available
> memory so that the files created was about 3xRAM. For multiple threads,
> the -d switch is specified multiple times.
>
> 3 kernels are tested.
>
> vanilla 3.0-rc6
> kswapdwb-v2r5 patches 1-7
> nokswapdwb-v2r5 patches 1-8
>
> The test machine is x86-64 with an older generation of AMD processor
> with 4 cores. The underlying storage was 4 disks configured as RAID-0
> as this was the best configuration of storage I had available. Swap
> is on a separate disk. Dirty ratio was tuned to 40% instead of the
> default of 20%.
>
> Testing was run with and without monitors to both verify that the
> patches were operating as expected and that any performance gain was
> real and not due to interference from monitors.
>
> I've posted the raw reports for each filesystem at
>
> http://www.csn.ul.ie/~mel/postings/reclaim-20110721
>
> Unfortunately, the volume of data is excessive but here is a partial
> summary of what was interesting for XFS.
Could you clarify the notation?
1P : 1 Processor?
512M: system memory size?
2X , 4X, 16X: the size of files created during test
>
> 512M1P-xfs Files/s mean 32.99 ( 0.00%) 35.16 ( 6.18%) 35.08 ( 5.94%)
> 512M1P-xfs Elapsed Time fsmark 122.54 115.54 115.21
> 512M1P-xfs Elapsed Time mmap-strm 105.09 104.44 106.12
> 512M-xfs Files/s mean 30.50 ( 0.00%) 33.30 ( 8.40%) 34.68 (12.06%)
> 512M-xfs Elapsed Time fsmark 136.14 124.26 120.33
> 512M-xfs Elapsed Time mmap-strm 154.68 145.91 138.83
> 512M-2X-xfs Files/s mean 28.48 ( 0.00%) 32.90 (13.45%) 32.83 (13.26%)
> 512M-2X-xfs Elapsed Time fsmark 145.64 128.67 128.67
> 512M-2X-xfs Elapsed Time mmap-strm 145.92 136.65 137.67
> 512M-4X-xfs Files/s mean 29.06 ( 0.00%) 32.82 (11.46%) 33.32 (12.81%)
> 512M-4X-xfs Elapsed Time fsmark 153.69 136.74 135.11
> 512M-4X-xfs Elapsed Time mmap-strm 159.47 128.64 132.59
> 512M-16X-xfs Files/s mean 48.80 ( 0.00%) 41.80 (-16.77%) 56.61 (13.79%)
> 512M-16X-xfs Elapsed Time fsmark 161.48 144.61 141.19
> 512M-16X-xfs Elapsed Time mmap-strm 167.04 150.62 147.83
>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2
2011-07-27 4:32 ` Minchan Kim
@ 2011-07-27 7:37 ` Mel Gorman
0 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2011-07-27 7:37 UTC (permalink / raw)
To: Minchan Kim
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel
On Wed, Jul 27, 2011 at 01:32:17PM +0900, Minchan Kim wrote:
> >
> > http://www.csn.ul.ie/~mel/postings/reclaim-20110721
> >
> > Unfortunately, the volume of data is excessive but here is a partial
> > summary of what was interesting for XFS.
>
> Could you clarify the notation?
> 1P : 1 Processor?
> 512M: system memory size?
> 2X , 4X, 16X: the size of files created during test
>
1P == 1 Processor
512M == 512M RAM (mem=512M)
2X == 2 x NUM_CPU fsmark threads
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2
2011-07-21 16:28 [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2 Mel Gorman
` (9 preceding siblings ...)
2011-07-27 4:32 ` Minchan Kim
@ 2011-07-27 16:18 ` Minchan Kim
2011-07-28 11:38 ` Mel Gorman
10 siblings, 1 reply; 43+ messages in thread
From: Minchan Kim @ 2011-07-27 16:18 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel
On Thu, Jul 21, 2011 at 05:28:42PM +0100, Mel Gorman wrote:
> Warning: Long post with lots of figures. If you normally drink coffee
> and you don't have a cup, get one or you may end up with a case of
> keyboard face.
At last, I get a coffee.
>
> Changelog since v1
> o Drop prio-inode patch. There is now a dependency that the flusher
> threads find these dirty pages quickly.
> o Drop nr_vmscan_throttled counter
> o SetPageReclaim instead of deactivate_page which was wrong
> o Add warning to main filesystems if called from direct reclaim context
> o Add patch to completely disable filesystem writeback from reclaim
It seems to go to the very desirable way.
>
> Testing from the XFS folk revealed that there is still too much
> I/O from the end of the LRU in kswapd. Previously it was considered
> acceptable by VM people for a small number of pages to be written
> back from reclaim with testing generally showing about 0.3% of pages
> reclaimed were written back (higher if memory was low). That writing
> back a small number of pages is ok has been heavily disputed for
> quite some time and Dave Chinner explained it well;
>
> It doesn't have to be a very high number to be a problem. IO
> is orders of magnitude slower than the CPU time it takes to
> flush a page, so the cost of making a bad flush decision is
> very high. And single page writeback from the LRU is almost
> always a bad flush decision.
>
> To complicate matters, filesystems respond very differently to requests
> from reclaim according to Christoph Hellwig;
>
> xfs tries to write it back if the requester is kswapd
> ext4 ignores the request if it's a delayed allocation
> btrfs ignores the request
>
> As a result, each filesystem has different performance characteristics
> when under memory pressure and there are many pages being dirties. In
> some cases, the request is ignored entirely so the VM cannot depend
> on the IO being dispatched.
>
> The objective of this series to to reduce writing of filesystem-backed
> pages from reclaim, play nicely with writeback that is already in
> progress and throttle reclaim appropriately when dirty pages are
> encountered. The assumption is that the flushers will always write
> pages faster than if reclaim issues the IO. The new problem is that
> reclaim has very little control over how long before a page in a
> particular zone or container is cleaned which is discussed later. A
> secondary goal is to avoid the problem whereby direct reclaim splices
> two potentially deep call stacks together.
>
> Patch 1 disables writeback of filesystem pages from direct reclaim
> entirely. Anonymous pages are still written.
>
> Patches 2-4 add warnings to XFS, ext4 and btrfs if called from
> direct reclaim. With patch 1, this "never happens" and
> is intended to catch regressions in this logic in the
> future.
>
> Patch 5 disables writeback of filesystem pages from kswapd unless
> the priority is raised to the point where kswapd is considered
> to be in trouble.
>
> Patch 6 throttles reclaimers if too many dirty pages are being
> encountered and the zones or backing devices are congested.
>
> Patch 7 invalidates dirty pages found at the end of the LRU so they
> are reclaimed quickly after being written back rather than
> waiting for a reclaimer to find them
>
> Patch 8 disables writeback of filesystem pages from kswapd and
> depends entirely on the flusher threads for cleaning pages.
> This is potentially a problem if the flusher threads take a
> long time to wake or are not discovering the pages we need
> cleaned. By placing the patch last, it's more likely that
> bisection can catch if this situation occurs and can be
> easily reverted.
Patch ordering is good, too.
>
> I consider this series to be orthogonal to the writeback work but
> it is worth noting that the writeback work affects the viability of
> patch 8 in particular.
>
> I tested this on ext4 and xfs using fs_mark and a micro benchmark
> that does a streaming write to a large mapping (exercises use-once
> LRU logic) followed by streaming writes to a mix of anonymous and
> file-backed mappings. The command line for fs_mark when botted with
> 512M looked something like
>
> ./fs_mark -d /tmp/fsmark-2676 -D 100 -N 150 -n 150 -L 25 -t 1 -S0 -s 10485760
>
> The number of files was adjusted depending on the amount of available
> memory so that the files created was about 3xRAM. For multiple threads,
> the -d switch is specified multiple times.
>
> 3 kernels are tested.
>
> vanilla 3.0-rc6
> kswapdwb-v2r5 patches 1-7
> nokswapdwb-v2r5 patches 1-8
>
> The test machine is x86-64 with an older generation of AMD processor
> with 4 cores. The underlying storage was 4 disks configured as RAID-0
> as this was the best configuration of storage I had available. Swap
> is on a separate disk. Dirty ratio was tuned to 40% instead of the
> default of 20%.
>
> Testing was run with and without monitors to both verify that the
> patches were operating as expected and that any performance gain was
> real and not due to interference from monitors.
Wow, it seems you would take a long time to finish your experiments.
Thanks for sharing good data.
>
> I've posted the raw reports for each filesystem at
>
> http://www.csn.ul.ie/~mel/postings/reclaim-20110721
>
> Unfortunately, the volume of data is excessive but here is a partial
> summary of what was interesting for XFS.
>
> 512M1P-xfs Files/s mean 32.99 ( 0.00%) 35.16 ( 6.18%) 35.08 ( 5.94%)
> 512M1P-xfs Elapsed Time fsmark 122.54 115.54 115.21
> 512M1P-xfs Elapsed Time mmap-strm 105.09 104.44 106.12
> 512M-xfs Files/s mean 30.50 ( 0.00%) 33.30 ( 8.40%) 34.68 (12.06%)
> 512M-xfs Elapsed Time fsmark 136.14 124.26 120.33
> 512M-xfs Elapsed Time mmap-strm 154.68 145.91 138.83
> 512M-2X-xfs Files/s mean 28.48 ( 0.00%) 32.90 (13.45%) 32.83 (13.26%)
> 512M-2X-xfs Elapsed Time fsmark 145.64 128.67 128.67
> 512M-2X-xfs Elapsed Time mmap-strm 145.92 136.65 137.67
> 512M-4X-xfs Files/s mean 29.06 ( 0.00%) 32.82 (11.46%) 33.32 (12.81%)
> 512M-4X-xfs Elapsed Time fsmark 153.69 136.74 135.11
> 512M-4X-xfs Elapsed Time mmap-strm 159.47 128.64 132.59
> 512M-16X-xfs Files/s mean 48.80 ( 0.00%) 41.80 (-16.77%) 56.61 (13.79%)
> 512M-16X-xfs Elapsed Time fsmark 161.48 144.61 141.19
> 512M-16X-xfs Elapsed Time mmap-strm 167.04 150.62 147.83
>
> The difference between kswapd writing and not writing for fsmark
> in many cases is marginal simply because kswapd was not reaching a
> high enough priority to enter writeback. Memory is mostly consumed
> by filesystem-backed pages so limiting the number of dirty pages
> (dirty_ratio == 40) means that kswapd always makes forward progress
> and avoids the OOM killer.
Looks promising as most of elapsed time is lower than vanilla.
>
> For the streaming-write benchmark, it does make a small difference as
> kswapd is reaching the higher priorities there due to a large number
> of anonymous pages added to the mix. The performance difference is
> marginal though as the number of filesystem pages written is about
> 1/50th of the number of anonymous pages written so it is drowned out.
It does make sense.
>
> I was initially worried about 512M-16X-xfs but it's well within the noise
> looking at the standard deviations from
> http://www.csn.ul.ie/~mel/postings/reclaim-20110721/html-no-monitor/global-dhp-512M-16X__writeback-reclaimdirty-xfs/hydra/comparison.html
>
> Files/s min 25.00 ( 0.00%) 31.10 (19.61%) 32.00 (21.88%)
> Files/s mean 48.80 ( 0.00%) 41.80 (-16.77%) 56.61 (13.79%)
> Files/s stddev 28.65 ( 0.00%) 11.32 (-153.19%) 32.79 (12.62%)
> Files/s max 133.20 ( 0.00%) 81.60 (-63.24%) 154.00 (13.51%)
Yes. it's within the noise so let's not worry about that.
>
> 64 threads writing on a machine with 4 CPUs with 512M RAM has variable
> performance which is hardly surprising.
Fair enough.
>
> The streaming-write benchmarks all completed faster.
>
> The tests were also run with mem=1024M and mem=4608M with the relative
> performance improvement reduced as memory increases reflecting that
> with enough memory there are fewer writes from reclaim as the flusher
> threads have time to clean the page before it reaches the end of
> the LRU.
>
> Here is the same tests except when using ext4
>
> 512M1P-ext4 Files/s mean 37.36 ( 0.00%) 37.10 (-0.71%) 37.66 ( 0.78%)
> 512M1P-ext4 Elapsed Time fsmark 108.93 109.91 108.61
> 512M1P-ext4 Elapsed Time mmap-strm 112.15 108.93 109.10
> 512M-ext4 Files/s mean 30.83 ( 0.00%) 39.80 (22.54%) 32.74 ( 5.83%)
> 512M-ext4 Elapsed Time fsmark 368.07 322.55 328.80
> 512M-ext4 Elapsed Time mmap-strm 131.98 117.01 118.94
> 512M-2X-ext4 Files/s mean 20.27 ( 0.00%) 22.75 (10.88%) 20.80 ( 2.52%)
> 512M-2X-ext4 Elapsed Time fsmark 518.06 493.74 479.21
> 512M-2X-ext4 Elapsed Time mmap-strm 131.32 126.64 117.05
> 512M-4X-ext4 Files/s mean 17.91 ( 0.00%) 12.30 (-45.63%) 16.58 (-8.06%)
> 512M-4X-ext4 Elapsed Time fsmark 633.41 660.70 572.74
> 512M-4X-ext4 Elapsed Time mmap-strm 137.85 127.63 124.07
> 512M-16X-ext4 Files/s mean 55.86 ( 0.00%) 69.90 (20.09%) 42.66 (-30.94%)
> 512M-16X-ext4 Elapsed Time fsmark 543.21 544.43 586.16
> 512M-16X-ext4 Elapsed Time mmap-strm 141.84 146.12 144.01
>
> At first glance, the benefit for ext4 is less clear cut but this
> is due to the standard deviation being very high. Take 512M-4X-ext4
> showing a 45.63% regression for example and we see.
>
> Files/s min 5.40 ( 0.00%) 4.10 (-31.71%) 6.50 (16.92%)
> Files/s mean 17.91 ( 0.00%) 12.30 (-45.63%) 16.58 (-8.06%)
> Files/s stddev 14.34 ( 0.00%) 8.04 (-78.46%) 14.50 ( 1.04%)
> Files/s max 54.30 ( 0.00%) 37.70 (-44.03%) 77.20 (29.66%)
>
> The standard deviation is *massive* meaning that the performance
> loss is well within the noise. The main positive out of this is the
Yes.
ext4 seems to be very sensitive on the situation.
> streaming write benchmarks are generally better.
>
> Where it does benefit is stalls in direct reclaim. Unlike xfs, ext4
> can stall direct reclaim writing back pages. When I look at a separate
> run using ftrace to gather more information, I see;
>
> 512M-ext4 Time stalled direct reclaim fsmark 0.36 0.30 0.31
> 512M-ext4 Time stalled direct reclaim mmap-strm 36.88 7.48 36.24
This data is odd.
[2] and [3] experiment's elapsed time is almost same(117.01, 118.94) but stall time in direct reclaim of
[2] is much fast. Hmm??
Anyway, if we don't write out in kswapd, it seems we can enter direct reclaim path so many time.
> 512M-4X-ext4 Time stalled direct reclaim fsmark 1.06 0.40 0.43
> 512M-4X-ext4 Time stalled direct reclaim mmap-strm 102.68 33.18 23.99
> 512M-16X-ext4 Time stalled direct reclaim fsmark 0.17 0.27 0.30
> 512M-16X-ext4 Time stalled direct reclaim mmap-strm 9.80 2.62 1.28
> 512M-32X-ext4 Time stalled direct reclaim fsmark 0.00 0.00 0.00
> 512M-32X-ext4 Time stalled direct reclaim mmap-strm 2.27 0.51 1.26
>
> Time spent in direct reclaim is reduced implying that bug reports
> complaining about the system becoming jittery when copying large
> files may also be hel.
It would be very good thing.
>
> To show what effect the patches are having, this is a more detailed
> look at one of the tests running with monitoring enabled. It's booted
> with mem=512M and the number of threads running is equal to the number
> of CPU cores. The backing filesystem is XFS.
>
> FS-Mark
> fsmark-3.0.0 3.0.0-rc6 3.0.0-rc6
> rc6-vanilla kswapwb-v2r5 nokswapwb-v2r5
> Files/s min 27.30 ( 0.00%) 31.80 (14.15%) 31.40 (13.06%)
> Files/s mean 30.32 ( 0.00%) 34.34 (11.73%) 34.52 (12.18%)
> Files/s stddev 1.39 ( 0.00%) 1.06 (-31.96%) 1.20 (-16.05%)
> Files/s max 33.60 ( 0.00%) 36.00 ( 6.67%) 36.30 ( 7.44%)
> Overhead min 1393832.00 ( 0.00%) 1793141.00 (-22.27%) 1133240.00 (23.00%)
> Overhead mean 2423808.52 ( 0.00%) 2513297.40 (-3.56%) 1823398.44 (32.93%)
> Overhead stddev 445880.26 ( 0.00%) 392952.66 (13.47%) 420498.38 ( 6.04%)
> Overhead max 3359477.00 ( 0.00%) 3184889.00 ( 5.48%) 3016170.00 (11.38%)
> MMTests Statistics: duration
> User/Sys Time Running Test (seconds) 53.26 52.27 51.88
What is User/Sys?
> Total Elapsed Time (seconds) 137.65 121.95 121.11
>
> Average files per second is increased by a nice percentage that is
> outside the noise. This is also true when I look at the results
Sure.
> without monitoring although the relative performance gain is less.
>
> Time to completion is reduced which is always good ane as it implies
> that IO was consistently higher and this is clearly visible at
>
> http://www.csn.ul.ie/~mel/postings/reclaim-20110721/html-run-monitor/global-dhp-512M__writeback-reclaimdirty-xfs/hydra/blockio-comparison-hydra.png
> http://www.csn.ul.ie/~mel/postings/reclaim-20110721/html-run-monitor/global-dhp-512M__writeback-reclaimdirty-xfs/hydra/blockio-comparison-smooth-hydra.png
>
> kswapd CPU usage is also interesting
>
> http://www.csn.ul.ie/~mel/postings/reclaim-20110721/html-run-monitor/global-dhp-512M__writeback-reclaimdirty-xfs/hydra/kswapdcpu-comparison-smooth-hydra.png
>
> Note how preventing kswapd reclaiming dirty pages pushes up its CPU
> usage as it scans more pages but it does not get excessive due to
> the throttling.
Good to hear.
The concern of this patchset was early OOM kill with too many scanning.
I can throw such concern out from now on.
>
> MMTests Statistics: vmstat
> Page Ins 1481672 1352900 1105364
> Page Outs 38397462 38337199 38366073
> Swap Ins 351918 320883 258868
> Swap Outs 132060 117715 123564
> Direct pages scanned 886587 968087 784109
> Kswapd pages scanned 18931089 18275983 18324613
> Kswapd pages reclaimed 8878200 8768648 8885482
> Direct pages reclaimed 883407 960496 781632
> Kswapd efficiency 46% 47% 48%
> Kswapd velocity 137530.614 149864.559 151305.532
> Direct efficiency 99% 99% 99%
> Direct velocity 6440.879 7938.393 6474.354
> Percentage direct scans 4% 5% 4%
> Page writes by reclaim 170014 117717 123510
> Page reclaim invalidate 0 1221396 1212857
> Page reclaim throttled 0 0 0
> Slabs scanned 23424 23680 23552
> Direct inode steals 0 0 0
> Kswapd inode steals 5560 5500 5584
> Kswapd skipped wait 20 3 5
> Compaction stalls 0 0 0
> Compaction success 0 0 0
> Compaction failures 0 0 0
> Compaction pages moved 0 0 0
> Compaction move failure 0 0 0
>
> These stats are based on information from /proc/vmstat
>
> "Kswapd efficiency" is the percentage of pages reclaimed to pages
> scanned. The higher the percentage is the better because a low
> percentage implies that kswapd is scanning uselessly. As the workload
> dirties memory heavily and is a small machine, the efficiency is low at
> 46% and marginally improves due to a reduced number of pages scanned.
> As memory increases, so does the efficiency as one might expect as
> the flushers have a chance to clean the pages in time.
>
> "Kswapd velocity" is the average number of pages scanned per
> second. The patches increase this as it's no longer getting blocked on
> page writes so it's expected but in general a higher velocity means
> that kswapd is doing more work and consuming more CPU. In this case,
> it is offset by the fact that fewer pages overall are scanned and
> the test completes faster but it explains why CPU usage is higher.
Fair enough.
>
> Page writes by reclaim is what is motivating this series. It goes
> from 170014 pages to 123510 which is a big improvement and we'll see
> later that these writes are for anonymous pages.
>
> "Page reclaim invalided" is very high and implies that a large number
> of dirty pages are reaching the end of the list quickly. Unfortunately,
> this is somewhat unavoidable. Kswapd is scanning pages at a rate
> of roughly 125000 (or 488M) a second on a 512M machine. The best
> possible writing rate of the underlying storage is about 300M/second.
> With the rate of reclaim exceeding the best possible writing speed,
> the system is going to get throttled.
Just out of curiosity.
What is 'Page reclaim throttled'?
>
> FTrace Reclaim Statistics: vmscan
> fsmark-3.0.0 3.0.0-rc6 3.0.0-rc6
> rc6-vanilla kswapwb-v2r5 nokswapwb-v2r5
> Direct reclaims 16173 17605 14313
> Direct reclaim pages scanned 886587 968087 784109
> Direct reclaim pages reclaimed 883407 960496 781632
> Direct reclaim write file async I/O 0 0 0
> Direct reclaim write anon async I/O 0 0 0
> Direct reclaim write file sync I/O 0 0 0
> Direct reclaim write anon sync I/O 0 0 0
> Wake kswapd requests 20699 22048 22893
> Kswapd wakeups 24 20 25
> Kswapd pages scanned 18931089 18275983 18324613
> Kswapd pages reclaimed 8878200 8768648 8885482
> Kswapd reclaim write file async I/O 37966 0 0
> Kswapd reclaim write anon async I/O 132062 117717 123567
> Kswapd reclaim write file sync I/O 0 0 0
> Kswapd reclaim write anon sync I/O 0 0 0
> Time stalled direct reclaim (seconds) 0.08 0.09 0.08
> Time kswapd awake (seconds) 132.11 117.78 115.82
>
> Total pages scanned 19817676 19244070 19108722
> Total pages reclaimed 9761607 9729144 9667114
> %age total pages scanned/reclaimed 49.26% 50.56% 50.59%
> %age total pages scanned/written 0.86% 0.61% 0.65%
> %age file pages scanned/written 0.19% 0.00% 0.00%
> Percentage Time Spent Direct Reclaim 0.15% 0.17% 0.15%
> Percentage Time kswapd Awake 95.98% 96.58% 95.63%
>
> Despite kswapd having higher CPU usage, it spent less time awake which
> is probably a reflection of the test completing faster. File writes
Make sense.
> from kswapd were 0 with the patches applied implying that kswapd was
> not getting to a priority high enough to start writing. The remaining
> writes correlate almost exactly to nr_vmscan_write implying that all
> writes were for anonymous pages.
>
> FTrace Reclaim Statistics: congestion_wait
> Direct number congest waited 0 0 0
> Direct time congest waited 0ms 0ms 0ms
> Direct full congest waited 0 0 0
> Direct number conditional waited 2 17 6
> Direct time conditional waited 0ms 0ms 0ms
> Direct full conditional waited 0 0 0
> KSwapd number congest waited 4 8 10
> KSwapd time congest waited 4ms 20ms 8ms
> KSwapd full congest waited 0 0 0
> KSwapd number conditional waited 0 26036 26283
> KSwapd time conditional waited 0ms 16ms 4ms
> KSwapd full conditional waited 0 0 0
What means congest and conditional?
congest is trace_writeback_congestion_wait and conditional is trace_writeback_wait_iff_congested?
>
> This is based on some of the writeback tracepoints. It's interesting
> to note that while kswapd got throttled about 26000 times with all
> patches applied, it spent negligible time asleep so probably just
> called cond_resched(). This implies that neither the zone nor the
> backing device are rarely truly congested and throttling is necessary
> simply to allow the pages to be written.
>
> MICRO
> MMTests Statistics: duration
> User/Sys Time Running Test (seconds) 32.57 31.18 30.52
> Total Elapsed Time (seconds) 166.29 141.94 148.23
>
> This test is in two stages. The first writes only to a file. The second
> writes to a mix of anonymous and file mappings. Time to completion
> is improved and this is still true with monitoring disabled.
Good.
>
> MMTests Statistics: vmstat
> Page Ins 11018260 10668536 10792204
> Page Outs 16632838 16468468 16449897
> Swap Ins 296167 245878 256038
> Swap Outs 221626 177922 179409
> Direct pages scanned 4129424 5172015 3686598
> Kswapd pages scanned 9152837 9000480 7909180
> Kswapd pages reclaimed 3388122 3284663 3371737
> Direct pages reclaimed 735425 765263 708713
> Kswapd efficiency 37% 36% 42%
> Kswapd velocity 55041.416 63410.455 53357.485
> Direct efficiency 17% 14% 19%
> Direct velocity 24832.666 36438.037 24870.795
> Percentage direct scans 31% 36% 31%
> Page writes by reclaim 347283 180065 179425
> Page writes skipped 0 0 0
> Page reclaim invalidate 0 864018 554666
> Write invalidated 0 0 0
> Page reclaim throttled 0 0 0
> Slabs scanned 14464 13696 13952
> Direct inode steals 470 864 934
> Kswapd inode steals 426 411 317
> Kswapd skipped wait 3255 3381 1437
> Compaction stalls 0 0 2
> Compaction success 0 0 1
> Compaction failures 0 0 1
> Compaction pages moved 0 0 0
> Compaction move failure 0 0 0
>
> Kswapd efficiency is improved slightly. kswapd is operating at roughly
> the same velocity but the number of pages scanned is far lower due
> to the test completing faster.
>
> Direct reclaim efficiency is improved slightly and scanning fewer pages
> (again due to lower time to completion).
>
> Fewer pages are being written from reclaim.
>
> FTrace Reclaim Statistics: vmscan
> micro-3.0.0 3.0.0-rc6 3.0.0-rc6
> rc6-vanilla kswapwb-v2r5 nokswapwb-v2r5
> Direct reclaims 14060 15425 13726
> Direct reclaim pages scanned 3596218 4621037 3613503
> Direct reclaim pages reclaimed 735425 765263 708713
> Direct reclaim write file async I/O 87264 0 0
> Direct reclaim write anon async I/O 10030 9127 15028
> Direct reclaim write file sync I/O 0 0 0
> Direct reclaim write anon sync I/O 0 0 0
> Wake kswapd requests 10424 10346 10786
> Kswapd wakeups 22 22 14
> Kswapd pages scanned 9041353 8889081 7895846
> Kswapd pages reclaimed 3388122 3284663 3371737
> Kswapd reclaim write file async I/O 7277 1710 0
> Kswapd reclaim write anon async I/O 184205 159178 162367
> Kswapd reclaim write file sync I/O 0 0 0
> Kswapd reclaim write anon sync I/O 0 0 0
> Time stalled direct reclaim (seconds) 54.29 5.67 14.29
> Time kswapd awake (seconds) 151.62 129.83 135.98
>
> Total pages scanned 12637571 13510118 11509349
> Total pages reclaimed 4123547 4049926 4080450
> %age total pages scanned/reclaimed 32.63% 29.98% 35.45%
> %age total pages scanned/written 2.29% 1.26% 1.54%
> %age file pages scanned/written 0.75% 0.01% 0.00%
> Percentage Time Spent Direct Reclaim 62.50% 15.39% 31.89%
> Percentage Time kswapd Awake 91.18% 91.47% 91.74%
>
> Time spent in direct reclaim is massively reduced which is surprising
Awesome!
> as this is XFS so it should not have been stalling in the writing
> files anyway. It's possible that the anon writes are completing
> faster so time spent swapping is reduced.
>
> With patches 1-7, kswapd still writes some pages due to it reaching
> higher priorities due to memory pressure but the number of pages it
> writes is significantly reduced and a small percentage of those that
> were written to swap. Patch 8 eliminates it entirely but the benefit is
> not seen in the completion times as the number of writes is so small.
Yes. It seems patch 8's effect is so small in general.
Even it increased direct reclaim time.
>
> FTrace Reclaim Statistics: congestion_wait
> Direct number congest waited 0 0 0
> Direct time congest waited 0ms 0ms 0ms
> Direct full congest waited 0 0 0
> Direct number conditional waited 12345 37713 34841
> Direct time conditional waited 12396ms 132ms 168ms
> Direct full conditional waited 53 0 0
> KSwapd number congest waited 4248 2957 2293
> KSwapd time congest waited 15320ms 10312ms 13416ms
> KSwapd full congest waited 31 1 21
> KSwapd number conditional waited 0 15989 10410
> KSwapd time conditional waited 0ms 0ms 0ms
> KSwapd full conditional waited 0 0 0
>
> Congestion is way down as direct reclaim conditional wait time is
> reduced by about 12 seconds.
>
> Overall, this looks good. Avoiding writes from kswapd improves
> overall performance as expected and eliminating them entirely seems
> to behave well.
I agree with you.
>
> Next I tested on a NUMA configuration of sorts. I don't have a real
> NUMA machine so I booted the same machine with mem=4096M numa=fake=8
> so each node is 512M. Again, the volume of information is high but
> here is a summary of sorts based on a test run with monitors enabled.
>
> 4096M8N-xfs Files/s mean 27.29 ( 0.00%) 27.35 ( 0.20%) 27.91 ( 2.22%)
> 4096M8N-xfs Elapsed Time fsmark 1402.55 1400.77 1382.92
> 4096M8N-xfs Elapsed Time mmap-strm 660.90 596.91 630.05
> 4096M8N-xfs Kswapd efficiency fsmark 72% 71% 13%
> 4096M8N-xfs Kswapd efficiency mmap-strm 39% 40% 31%
> 4096M8N-xfs stalled direct reclaim fsmark 0.00 0.00 0.00
> 4096M8N-xfs stalled direct reclaim mmap-strm 36.37 13.06 56.88
> 4096M8N-4X-xfs Files/s mean 26.80 ( 0.00%) 26.41 (-1.47%) 26.40 (-1.53%)
> 4096M8N-4X-xfs Elapsed Time fsmark 1453.95 1460.62 1470.98
> 4096M8N-4X-xfs Elapsed Time mmap-strm 683.34 663.46 690.01
> 4096M8N-4X-xfs Kswapd efficiency fsmark 68% 67% 8%
> 4096M8N-4X-xfs Kswapd efficiency mmap-strm 35% 34% 6%
> 4096M8N-4X-xfs stalled direct reclaim fsmark 0.00 0.00 0.00
> 4096M8N-4X-xfs stalled direct reclaim mmap-strm 26.45 87.57 46.87
> 4096M8N-2X-xfs Files/s mean 26.22 ( 0.00%) 26.70 ( 1.77%) 27.21 ( 3.62%)
> 4096M8N-2X-xfs Elapsed Time fsmark 1469.28 1439.30 1424.45
> 4096M8N-2X-xfs Elapsed Time mmap-strm 676.77 656.28 655.03
> 4096M8N-2X-xfs Kswapd efficiency fsmark 69% 69% 9%
> 4096M8N-2X-xfs Kswapd efficiency mmap-strm 33% 33% 7%
> 4096M8N-2X-xfs stalled direct reclaim fsmark 0.00 0.00 0.00
> 4096M8N-2X-xfs stalled direct reclaim mmap-strm 52.74 57.96 102.49
> 4096M8N-16X-xfs Files/s mean 25.78 ( 0.00%) 27.81 ( 7.32%) 48.52 (46.87%)
> 4096M8N-16X-xfs Elapsed Time fsmark 1555.95 1554.78 1542.53
> 4096M8N-16X-xfs Elapsed Time mmap-strm 770.01 763.62 844.55
> 4096M8N-16X-xfs Kswapd efficiency fsmark 62% 62% 7%
> 4096M8N-16X-xfs Kswapd efficiency mmap-strm 38% 37% 10%
> 4096M8N-16X-xfs stalled direct reclaim fsmark 0.12 0.01 0.05
> 4096M8N-16X-xfs stalled direct reclaim mmap-strm 1.07 1.09 63.32
>
> The performance differences for fsmark are marginal because the number
> of page written from reclaim is pretty low with this much memory even
> with NUMA enabled. At no point did fsmark enter direct reclaim to
> try and write a page so it's all kswapd. What is important to note is
> the "Kswapd efficiency". Once kswapd cannot write pages at all, its
> efficiency drops rapidly for fsmark as it scans about 5-8 times more
> pages waiting on flusher threads to clean a page from the correct node.
>
> Kswapd not writing pages impairs direct reclaim performance for the
> streaming writer test. Note the times stalled in direct reclaim. In
> all cases, the time stalled in direct reclaim goes way up as both
> direct reclaimers and kswapd get stalled waiting on pages to get
> cleaned from the right node.
Yes. The data is horrible.
>
> Fortunately, kswapd CPU usage does not go to 100% because of the
> throttling. From the 40968M test for example, I see
>
> KSwapd full congest waited 834 739 989
> KSwapd number conditional waited 0 68552 372275
> KSwapd time conditional waited 0ms 16ms 1684ms
> KSwapd full conditional waited 0 0 0
>
> With kswapd avoiding writes, it gets throttled lightly but when it
> writes no pasges at all, it gets throttled very heavily and sleeps.
>
> ext4 tells a slightly different story
>
> 4096M8N-ext4 Files/s mean 28.63 ( 0.00%) 30.58 ( 6.37%) 31.04 ( 7.76%)
> 4096M8N-ext4 Elapsed Time fsmark 1578.51 1551.99 1532.65
> 4096M8N-ext4 Elapsed Time mmap-strm 703.66 655.25 654.86
> 4096M8N-ext4 Kswapd efficiency 62% 69% 68%
> 4096M8N-ext4 Kswapd efficiency 35% 35% 35%
> 4096M8N-ext4 stalled direct reclaim fsmark 0.00 0.00 0.00
> 4096M8N-ext4 stalled direct reclaim mmap-strm 32.64 95.72 152.62
> 4096M8N-2X-ext4 Files/s mean 30.74 ( 0.00%) 28.49 (-7.89%) 28.79 (-6.75%)
> 4096M8N-2X-ext4 Elapsed Time fsmark 1466.62 1583.12 1580.07
> 4096M8N-2X-ext4 Elapsed Time mmap-strm 705.17 705.64 693.01
> 4096M8N-2X-ext4 Kswapd efficiency 68% 68% 67%
> 4096M8N-2X-ext4 Kswapd efficiency 34% 30% 18%
> 4096M8N-2X-ext4 stalled direct reclaim fsmark 0.00 0.00 0.00
> 4096M8N-2X-ext4 stalled direct reclaim mmap-strm 106.82 24.88 27.88
> 4096M8N-4X-ext4 Files/s mean 24.15 ( 0.00%) 23.18 (-4.18%) 23.94 (-0.89%)
> 4096M8N-4X-ext4 Elapsed Time fsmark 1848.41 1971.48 1867.07
> 4096M8N-4X-ext4 Elapsed Time mmap-strm 664.87 673.66 674.46
> 4096M8N-4X-ext4 Kswapd efficiency 62% 65% 65%
> 4096M8N-4X-ext4 Kswapd efficiency 33% 37% 15%
> 4096M8N-4X-ext4 stalled direct reclaim fsmark 0.18 0.03 0.26
> 4096M8N-4X-ext4 stalled direct reclaim mmap-strm 115.71 23.05 61.12
> 4096M8N-16X-ext4 Files/s mean 5.42 ( 0.00%) 5.43 ( 0.15%) 3.83 (-41.44%)
> 4096M8N-16X-ext4 Elapsed Time fsmark 9572.85 9653.66 11245.41
> 4096M8N-16X-ext4 Elapsed Time mmap-strm 752.88 750.38 769.19
> 4096M8N-16X-ext4 Kswapd efficiency 59% 59% 61%
> 4096M8N-16X-ext4 Kswapd efficiency 34% 34% 21%
> 4096M8N-16X-ext4 stalled direct reclaim fsmark 0.26 0.65 0.26
> 4096M8N-16X-ext4 stalled direct reclaim mmap-strm 177.48 125.91 196.92
>
> 4096M8N-16X-ext4 with kswapd writing no pages collapsed in terms of
> performance. Looking at the fsmark logs, in a number of iterations,
> it was barely able to write files at all.
>
> The apparent slowdown for fsmark in 4096M8N-2X-ext4 is well within
> the noise but the reduced time spent in direct reclaim is very welcome.
But 4096M8N-ext4 increased the time and 4096M8N-2X-ext4 is within the noise
as you said. I doubt it's reliability.
>
> Unlike xfs, it's less clear cut if direct reclaim performance is
> impaired but in a few tests, preventing kswapd writing pages did
> increase the time stalled.
>
> Last test is that I've been running this series on my laptop since
> Monday without any problem but it's rarely under serious memory
> pressure. I see nr_vmscan_write is 0 and the number of pages
> invalidated from the end of the LRU is only 10844 after 3 days so
> it's not much of a test.
>
> Overall, having kswapd avoiding writes does improve performance
> which is not a surprise. Dave asked "do we even need IO at all from
> reclaim?". On NUMA machines, the answer is "yes" unless the VM can
> wake the flusher thread to clean a specific node. When kswapd never
> writes, processes can stall for significant periods of time waiting on
> flushers to clean the correct pages. If all writing is to be deferred
> to flushers, it must ensure that many writes on one node would not
> starve requests for cleaning pages on another node.
It's a good answer. :)
>
> I'm currently of the opinion that we should consider merging patches
> 1-7 and discuss what is required before merging. It can be tackled
> later how the flushers can prioritise writing of pages belonging to
> a particular zone before disabling all writes from reclaim. There
> is already some work in this general area with the possibility that
> series such as "writeback: moving expire targets for background/kupdate
> works" could be extended to allow patch 8 to be merged later even if
> the series needs work.
I think you already knew what we need(ie, prioritising the pages in a zone)
In case of NUMA, 1-7 has a problem in ext4 so we have to focus NUMA during remained time.
The alternative of [prioritising the page in a zone] might be Johannes's [mm: per-zone dirty limiting].
It might mitigate NUMA problems.
Overall, I really welcome this approach and would like to merge this in mmotm as soon as possible
for see the side effects in non-NUMA(I will add my reviewed-by soon).
In case of NUMA, we know the problem apparently so I think it could be solved
before it is sent to mainline.
It was a great time to see your data and you makes my coffee delicious. :)
You're a good Barista.
Thanks for your great effort, Mel!
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2
2011-07-27 16:18 ` Minchan Kim
@ 2011-07-28 11:38 ` Mel Gorman
2011-07-29 9:48 ` Minchan Kim
0 siblings, 1 reply; 43+ messages in thread
From: Mel Gorman @ 2011-07-28 11:38 UTC (permalink / raw)
To: Minchan Kim
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel
On Thu, Jul 28, 2011 at 01:18:21AM +0900, Minchan Kim wrote:
> On Thu, Jul 21, 2011 at 05:28:42PM +0100, Mel Gorman wrote:
> > Warning: Long post with lots of figures. If you normally drink coffee
> > and you don't have a cup, get one or you may end up with a case of
> > keyboard face.
>
> At last, I get a coffee.
>
Nice one.
> > <SNIP>
> > I consider this series to be orthogonal to the writeback work but
> > it is worth noting that the writeback work affects the viability of
> > patch 8 in particular.
> >
> > I tested this on ext4 and xfs using fs_mark and a micro benchmark
> > that does a streaming write to a large mapping (exercises use-once
> > LRU logic) followed by streaming writes to a mix of anonymous and
> > file-backed mappings. The command line for fs_mark when botted with
> > 512M looked something like
> >
> > ./fs_mark -d /tmp/fsmark-2676 -D 100 -N 150 -n 150 -L 25 -t 1 -S0 -s 10485760
> >
> > The number of files was adjusted depending on the amount of available
> > memory so that the files created was about 3xRAM. For multiple threads,
> > the -d switch is specified multiple times.
> >
> > 3 kernels are tested.
> >
> > vanilla 3.0-rc6
> > kswapdwb-v2r5 patches 1-7
> > nokswapdwb-v2r5 patches 1-8
> >
> > The test machine is x86-64 with an older generation of AMD processor
> > with 4 cores. The underlying storage was 4 disks configured as RAID-0
> > as this was the best configuration of storage I had available. Swap
> > is on a separate disk. Dirty ratio was tuned to 40% instead of the
> > default of 20%.
> >
> > Testing was run with and without monitors to both verify that the
> > patches were operating as expected and that any performance gain was
> > real and not due to interference from monitors.
>
> Wow, it seems you would take a long time to finish your experiments.
Yes, they take a long time to run.
> > I've posted the raw reports for each filesystem at
> >
> > http://www.csn.ul.ie/~mel/postings/reclaim-20110721
> >
> > Unfortunately, the volume of data is excessive but here is a partial
> > summary of what was interesting for XFS.
> >
> > 512M1P-xfs Files/s mean 32.99 ( 0.00%) 35.16 ( 6.18%) 35.08 ( 5.94%)
> > 512M1P-xfs Elapsed Time fsmark 122.54 115.54 115.21
> > 512M1P-xfs Elapsed Time mmap-strm 105.09 104.44 106.12
> > 512M-xfs Files/s mean 30.50 ( 0.00%) 33.30 ( 8.40%) 34.68 (12.06%)
> > 512M-xfs Elapsed Time fsmark 136.14 124.26 120.33
> > 512M-xfs Elapsed Time mmap-strm 154.68 145.91 138.83
> > 512M-2X-xfs Files/s mean 28.48 ( 0.00%) 32.90 (13.45%) 32.83 (13.26%)
> > 512M-2X-xfs Elapsed Time fsmark 145.64 128.67 128.67
> > 512M-2X-xfs Elapsed Time mmap-strm 145.92 136.65 137.67
> > 512M-4X-xfs Files/s mean 29.06 ( 0.00%) 32.82 (11.46%) 33.32 (12.81%)
> > 512M-4X-xfs Elapsed Time fsmark 153.69 136.74 135.11
> > 512M-4X-xfs Elapsed Time mmap-strm 159.47 128.64 132.59
> > 512M-16X-xfs Files/s mean 48.80 ( 0.00%) 41.80 (-16.77%) 56.61 (13.79%)
> > 512M-16X-xfs Elapsed Time fsmark 161.48 144.61 141.19
> > 512M-16X-xfs Elapsed Time mmap-strm 167.04 150.62 147.83
> >
> > The difference between kswapd writing and not writing for fsmark
> > in many cases is marginal simply because kswapd was not reaching a
> > high enough priority to enter writeback. Memory is mostly consumed
> > by filesystem-backed pages so limiting the number of dirty pages
> > (dirty_ratio == 40) means that kswapd always makes forward progress
> > and avoids the OOM killer.
>
> Looks promising as most of elapsed time is lower than vanilla.
>
Yes. Lower elapsed time is not always better. For example, some tests I
run will execute a variable number of times trying to get a good
estimate of the true mean. For these tests, there is a fixed number of
iterations so a lower elapsed time implies higher throughput.
> > The streaming-write benchmarks all completed faster.
> >
> > The tests were also run with mem=1024M and mem=4608M with the relative
> > performance improvement reduced as memory increases reflecting that
> > with enough memory there are fewer writes from reclaim as the flusher
> > threads have time to clean the page before it reaches the end of
> > the LRU.
> >
> > Here is the same tests except when using ext4
> >
> > 512M1P-ext4 Files/s mean 37.36 ( 0.00%) 37.10 (-0.71%) 37.66 ( 0.78%)
> > 512M1P-ext4 Elapsed Time fsmark 108.93 109.91 108.61
> > 512M1P-ext4 Elapsed Time mmap-strm 112.15 108.93 109.10
> > 512M-ext4 Files/s mean 30.83 ( 0.00%) 39.80 (22.54%) 32.74 ( 5.83%)
> > 512M-ext4 Elapsed Time fsmark 368.07 322.55 328.80
> > 512M-ext4 Elapsed Time mmap-strm 131.98 117.01 118.94
> > 512M-2X-ext4 Files/s mean 20.27 ( 0.00%) 22.75 (10.88%) 20.80 ( 2.52%)
> > 512M-2X-ext4 Elapsed Time fsmark 518.06 493.74 479.21
> > 512M-2X-ext4 Elapsed Time mmap-strm 131.32 126.64 117.05
> > 512M-4X-ext4 Files/s mean 17.91 ( 0.00%) 12.30 (-45.63%) 16.58 (-8.06%)
> > 512M-4X-ext4 Elapsed Time fsmark 633.41 660.70 572.74
> > 512M-4X-ext4 Elapsed Time mmap-strm 137.85 127.63 124.07
> > 512M-16X-ext4 Files/s mean 55.86 ( 0.00%) 69.90 (20.09%) 42.66 (-30.94%)
> > 512M-16X-ext4 Elapsed Time fsmark 543.21 544.43 586.16
> > 512M-16X-ext4 Elapsed Time mmap-strm 141.84 146.12 144.01
> >
> > At first glance, the benefit for ext4 is less clear cut but this
> > is due to the standard deviation being very high. Take 512M-4X-ext4
> > showing a 45.63% regression for example and we see.
> >
> > Files/s min 5.40 ( 0.00%) 4.10 (-31.71%) 6.50 (16.92%)
> > Files/s mean 17.91 ( 0.00%) 12.30 (-45.63%) 16.58 (-8.06%)
> > Files/s stddev 14.34 ( 0.00%) 8.04 (-78.46%) 14.50 ( 1.04%)
> > Files/s max 54.30 ( 0.00%) 37.70 (-44.03%) 77.20 (29.66%)
> >
> > The standard deviation is *massive* meaning that the performance
> > loss is well within the noise. The main positive out of this is the
>
> Yes.
> ext4 seems to be very sensitive on the situation.
>
It'd be nice to have a theory as to why it is so variable but it could
be simply down to disk layout and seeks. I wasn't running blktrace to
see if that was the case. As this is RAID, it's also possible it is a
stride problem as I didn't specify stride= to mkfs.
> > streaming write benchmarks are generally better.
> >
> > Where it does benefit is stalls in direct reclaim. Unlike xfs, ext4
> > can stall direct reclaim writing back pages. When I look at a separate
> > run using ftrace to gather more information, I see;
> >
> > 512M-ext4 Time stalled direct reclaim fsmark 0.36 0.30 0.31
> > 512M-ext4 Time stalled direct reclaim mmap-strm 36.88 7.48 36.24
>
> This data is odd.
> [2] and [3] experiment's elapsed time is almost same(117.01, 118.94) but stall time in direct reclaim of
> [2] is much fast. Hmm??
It could have been just luck on that particular run. These figures
don't tell us *which* process got stuck in direct reclaim for that
length of time. If it was one of the monitors recording stats for
example, it wouldn't affect the reported results. It could be figured
out from the trace data if I went back through it but it's probably
not worth the trouble.
> > 512M-4X-ext4 Time stalled direct reclaim fsmark 1.06 0.40 0.43
> > 512M-4X-ext4 Time stalled direct reclaim mmap-strm 102.68 33.18 23.99
> > 512M-16X-ext4 Time stalled direct reclaim fsmark 0.17 0.27 0.30
> > 512M-16X-ext4 Time stalled direct reclaim mmap-strm 9.80 2.62 1.28
> > 512M-32X-ext4 Time stalled direct reclaim fsmark 0.00 0.00 0.00
> > 512M-32X-ext4 Time stalled direct reclaim mmap-strm 2.27 0.51 1.26
> >
> > Time spent in direct reclaim is reduced implying that bug reports
> > complaining about the system becoming jittery when copying large
> > files may also be hel.
>
> It would be very good thing.
>
I'm currently running the same tests on a laptop using a USB stick for
storage to see if something useful comes out.
> > To show what effect the patches are having, this is a more detailed
> > look at one of the tests running with monitoring enabled. It's booted
> > with mem=512M and the number of threads running is equal to the number
> > of CPU cores. The backing filesystem is XFS.
> >
> > FS-Mark
> > fsmark-3.0.0 3.0.0-rc6 3.0.0-rc6
> > rc6-vanilla kswapwb-v2r5 nokswapwb-v2r5
> > Files/s min 27.30 ( 0.00%) 31.80 (14.15%) 31.40 (13.06%)
> > Files/s mean 30.32 ( 0.00%) 34.34 (11.73%) 34.52 (12.18%)
> > Files/s stddev 1.39 ( 0.00%) 1.06 (-31.96%) 1.20 (-16.05%)
> > Files/s max 33.60 ( 0.00%) 36.00 ( 6.67%) 36.30 ( 7.44%)
> > Overhead min 1393832.00 ( 0.00%) 1793141.00 (-22.27%) 1133240.00 (23.00%)
> > Overhead mean 2423808.52 ( 0.00%) 2513297.40 (-3.56%) 1823398.44 (32.93%)
> > Overhead stddev 445880.26 ( 0.00%) 392952.66 (13.47%) 420498.38 ( 6.04%)
> > Overhead max 3359477.00 ( 0.00%) 3184889.00 ( 5.48%) 3016170.00 (11.38%)
> > MMTests Statistics: duration
> > User/Sys Time Running Test (seconds) 53.26 52.27 51.88
>
> What is User/Sys?
>
The sum if the CPU-seconds spent in user and sys mode. Should have used
a + there :/
> > <SNIP>
> > without monitoring although the relative performance gain is less.
> >
> > Time to completion is reduced which is always good ane as it implies
> > that IO was consistently higher and this is clearly visible at
> >
> > http://www.csn.ul.ie/~mel/postings/reclaim-20110721/html-run-monitor/global-dhp-512M__writeback-reclaimdirty-xfs/hydra/blockio-comparison-hydra.png
> > http://www.csn.ul.ie/~mel/postings/reclaim-20110721/html-run-monitor/global-dhp-512M__writeback-reclaimdirty-xfs/hydra/blockio-comparison-smooth-hydra.png
> >
> > kswapd CPU usage is also interesting
> >
> > http://www.csn.ul.ie/~mel/postings/reclaim-20110721/html-run-monitor/global-dhp-512M__writeback-reclaimdirty-xfs/hydra/kswapdcpu-comparison-smooth-hydra.png
> >
> > Note how preventing kswapd reclaiming dirty pages pushes up its CPU
> > usage as it scans more pages but it does not get excessive due to
> > the throttling.
>
> Good to hear.
> The concern of this patchset was early OOM kill with too many scanning.
> I can throw such concern out from now on.
>
At least, I haven't been able to trigger a premature OOM.
> > <SNIP>
> > Page writes by reclaim is what is motivating this series. It goes
> > from 170014 pages to 123510 which is a big improvement and we'll see
> > later that these writes are for anonymous pages.
> >
> > "Page reclaim invalided" is very high and implies that a large number
> > of dirty pages are reaching the end of the list quickly. Unfortunately,
> > this is somewhat unavoidable. Kswapd is scanning pages at a rate
> > of roughly 125000 (or 488M) a second on a 512M machine. The best
> > possible writing rate of the underlying storage is about 300M/second.
> > With the rate of reclaim exceeding the best possible writing speed,
> > the system is going to get throttled.
>
> Just out of curiosity.
> What is 'Page reclaim throttled'?
>
It should have been deleted from this report. It used to be a vmstat
counting how many times patch 6 called wait_iff_congested(). It no
longer exists.
> > <SNIP>
> > from kswapd were 0 with the patches applied implying that kswapd was
> > not getting to a priority high enough to start writing. The remaining
> > writes correlate almost exactly to nr_vmscan_write implying that all
> > writes were for anonymous pages.
> >
> > FTrace Reclaim Statistics: congestion_wait
> > Direct number congest waited 0 0 0
> > Direct time congest waited 0ms 0ms 0ms
> > Direct full congest waited 0 0 0
> > Direct number conditional waited 2 17 6
> > Direct time conditional waited 0ms 0ms 0ms
> > Direct full conditional waited 0 0 0
> > KSwapd number congest waited 4 8 10
> > KSwapd time congest waited 4ms 20ms 8ms
> > KSwapd full congest waited 0 0 0
> > KSwapd number conditional waited 0 26036 26283
> > KSwapd time conditional waited 0ms 16ms 4ms
> > KSwapd full conditional waited 0 0 0
>
> What means congest and conditional?
> congest is trace_writeback_congestion_wait and conditional is trace_writeback_wait_iff_congested?
>
Yes.
> > <SNIP>
> > Next I tested on a NUMA configuration of sorts. I don't have a real
> > NUMA machine so I booted the same machine with mem=4096M numa=fake=8
> > so each node is 512M. Again, the volume of information is high but
> > here is a summary of sorts based on a test run with monitors enabled.
> >
> > <XFS discussion snipped>
> >
> > With kswapd avoiding writes, it gets throttled lightly but when it
> > writes no pasges at all, it gets throttled very heavily and sleeps.
> >
> > ext4 tells a slightly different story
> >
> > 4096M8N-ext4 Files/s mean 28.63 ( 0.00%) 30.58 ( 6.37%) 31.04 ( 7.76%)
> > 4096M8N-ext4 Elapsed Time fsmark 1578.51 1551.99 1532.65
> > 4096M8N-ext4 Elapsed Time mmap-strm 703.66 655.25 654.86
> > 4096M8N-ext4 Kswapd efficiency 62% 69% 68%
> > 4096M8N-ext4 Kswapd efficiency 35% 35% 35%
> > 4096M8N-ext4 stalled direct reclaim fsmark 0.00 0.00 0.00
> > 4096M8N-ext4 stalled direct reclaim mmap-strm 32.64 95.72 152.62
> > 4096M8N-2X-ext4 Files/s mean 30.74 ( 0.00%) 28.49 (-7.89%) 28.79 (-6.75%)
> > 4096M8N-2X-ext4 Elapsed Time fsmark 1466.62 1583.12 1580.07
> > 4096M8N-2X-ext4 Elapsed Time mmap-strm 705.17 705.64 693.01
> > 4096M8N-2X-ext4 Kswapd efficiency 68% 68% 67%
> > 4096M8N-2X-ext4 Kswapd efficiency 34% 30% 18%
> > 4096M8N-2X-ext4 stalled direct reclaim fsmark 0.00 0.00 0.00
> > 4096M8N-2X-ext4 stalled direct reclaim mmap-strm 106.82 24.88 27.88
> > 4096M8N-4X-ext4 Files/s mean 24.15 ( 0.00%) 23.18 (-4.18%) 23.94 (-0.89%)
> > 4096M8N-4X-ext4 Elapsed Time fsmark 1848.41 1971.48 1867.07
> > 4096M8N-4X-ext4 Elapsed Time mmap-strm 664.87 673.66 674.46
> > 4096M8N-4X-ext4 Kswapd efficiency 62% 65% 65%
> > 4096M8N-4X-ext4 Kswapd efficiency 33% 37% 15%
> > 4096M8N-4X-ext4 stalled direct reclaim fsmark 0.18 0.03 0.26
> > 4096M8N-4X-ext4 stalled direct reclaim mmap-strm 115.71 23.05 61.12
> > 4096M8N-16X-ext4 Files/s mean 5.42 ( 0.00%) 5.43 ( 0.15%) 3.83 (-41.44%)
> > 4096M8N-16X-ext4 Elapsed Time fsmark 9572.85 9653.66 11245.41
> > 4096M8N-16X-ext4 Elapsed Time mmap-strm 752.88 750.38 769.19
> > 4096M8N-16X-ext4 Kswapd efficiency 59% 59% 61%
> > 4096M8N-16X-ext4 Kswapd efficiency 34% 34% 21%
> > 4096M8N-16X-ext4 stalled direct reclaim fsmark 0.26 0.65 0.26
> > 4096M8N-16X-ext4 stalled direct reclaim mmap-strm 177.48 125.91 196.92
> >
> > 4096M8N-16X-ext4 with kswapd writing no pages collapsed in terms of
> > performance. Looking at the fsmark logs, in a number of iterations,
> > it was barely able to write files at all.
> >
> > The apparent slowdown for fsmark in 4096M8N-2X-ext4 is well within
> > the noise but the reduced time spent in direct reclaim is very welcome.
>
> But 4096M8N-ext4 increased the time and 4096M8N-2X-ext4 is within the noise
> as you said. I doubt it's reliability.
>
Agreed. Again, it could be figured out which process is stalling but it
wouldn't tell us very much.
> >
> > Unlike xfs, it's less clear cut if direct reclaim performance is
> > impaired but in a few tests, preventing kswapd writing pages did
> > increase the time stalled.
> >
> > Last test is that I've been running this series on my laptop since
> > Monday without any problem but it's rarely under serious memory
> > pressure. I see nr_vmscan_write is 0 and the number of pages
> > invalidated from the end of the LRU is only 10844 after 3 days so
> > it's not much of a test.
> >
> > Overall, having kswapd avoiding writes does improve performance
> > which is not a surprise. Dave asked "do we even need IO at all from
> > reclaim?". On NUMA machines, the answer is "yes" unless the VM can
> > wake the flusher thread to clean a specific node. When kswapd never
> > writes, processes can stall for significant periods of time waiting on
> > flushers to clean the correct pages. If all writing is to be deferred
> > to flushers, it must ensure that many writes on one node would not
> > starve requests for cleaning pages on another node.
>
> It's a good answer. :)
>
Thanks :)
> > I'm currently of the opinion that we should consider merging patches
> > 1-7 and discuss what is required before merging. It can be tackled
> > later how the flushers can prioritise writing of pages belonging to
> > a particular zone before disabling all writes from reclaim. There
> > is already some work in this general area with the possibility that
> > series such as "writeback: moving expire targets for background/kupdate
> > works" could be extended to allow patch 8 to be merged later even if
> > the series needs work.
>
> I think you already knew what we need(ie, prioritising the pages in a zone)
> In case of NUMA, 1-7 has a problem in ext4 so we have to focus NUMA during remained time.
>
The slowdown for ext4 was within the noise but I'll run it again and
confirm that it really is not a problem.
> The alternative of [prioritising the page in a zone] might be Johannes's [mm: per-zone dirty limiting].
> It might mitigate NUMA problems.
>
It might.
> Overall, I really welcome this approach and would like to merge this in mmotm as soon as possible
> for see the side effects in non-NUMA(I will add my reviewed-by soon).
> In case of NUMA, we know the problem apparently so I think it could be solved
> before it is sent to mainline.
>
> It was a great time to see your data and you makes my coffee delicious. :)
> You're a good Barista.
> Thanks for your great effort, Mel!
>
Thanks for your review.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2
2011-07-28 11:38 ` Mel Gorman
@ 2011-07-29 9:48 ` Minchan Kim
2011-07-29 9:50 ` Minchan Kim
0 siblings, 1 reply; 43+ messages in thread
From: Minchan Kim @ 2011-07-29 9:48 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel
On Thu, Jul 28, 2011 at 12:38:52PM +0100, Mel Gorman wrote:
> On Thu, Jul 28, 2011 at 01:18:21AM +0900, Minchan Kim wrote:
> > On Thu, Jul 21, 2011 at 05:28:42PM +0100, Mel Gorman wrote:
> > > Note how preventing kswapd reclaiming dirty pages pushes up its CPU
<snip>
> > > usage as it scans more pages but it does not get excessive due to
> > > the throttling.
> >
> > Good to hear.
> > The concern of this patchset was early OOM kill with too many scanning.
> > I can throw such concern out from now on.
> >
>
> At least, I haven't been able to trigger a premature OOM.
AFAIR, Andrew had a premature OOM problem[1] but I couldn't track down at that time.
I think this patch series might solve his problem. Although it doesn't, it should not accelerate
his problem, at least.
Andrew, Could you test this patchset?
[1] https://lkml.org/lkml/2011/5/25/415
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2
2011-07-29 9:48 ` Minchan Kim
@ 2011-07-29 9:50 ` Minchan Kim
2011-07-29 13:41 ` Andrew Lutomirski
0 siblings, 1 reply; 43+ messages in thread
From: Minchan Kim @ 2011-07-29 9:50 UTC (permalink / raw)
To: Mel Gorman, Andrew Lutomirski
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel
Sorry for missing Ccing.
On Fri, Jul 29, 2011 at 06:48:16PM +0900, Minchan Kim wrote:
> On Thu, Jul 28, 2011 at 12:38:52PM +0100, Mel Gorman wrote:
> > On Thu, Jul 28, 2011 at 01:18:21AM +0900, Minchan Kim wrote:
> > > On Thu, Jul 21, 2011 at 05:28:42PM +0100, Mel Gorman wrote:
> > > > Note how preventing kswapd reclaiming dirty pages pushes up its CPU
>
> <snip>
>
> > > > usage as it scans more pages but it does not get excessive due to
> > > > the throttling.
> > >
> > > Good to hear.
> > > The concern of this patchset was early OOM kill with too many scanning.
> > > I can throw such concern out from now on.
> > >
> >
> > At least, I haven't been able to trigger a premature OOM.
>
> AFAIR, Andrew had a premature OOM problem[1] but I couldn't track down at that time.
> I think this patch series might solve his problem. Although it doesn't, it should not accelerate
> his problem, at least.
>
> Andrew, Could you test this patchset?
>
> [1] https://lkml.org/lkml/2011/5/25/415
> --
> Kind regards,
> Minchan Kim
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2
2011-07-29 9:50 ` Minchan Kim
@ 2011-07-29 13:41 ` Andrew Lutomirski
0 siblings, 0 replies; 43+ messages in thread
From: Andrew Lutomirski @ 2011-07-29 13:41 UTC (permalink / raw)
To: Minchan Kim
Cc: Mel Gorman, Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel
On Fri, Jul 29, 2011 at 5:50 AM, Minchan Kim <minchan.kim@gmail.com> wrote:
> Sorry for missing Ccing.
>
> On Fri, Jul 29, 2011 at 06:48:16PM +0900, Minchan Kim wrote:
>> On Thu, Jul 28, 2011 at 12:38:52PM +0100, Mel Gorman wrote:
>> > On Thu, Jul 28, 2011 at 01:18:21AM +0900, Minchan Kim wrote:
>> > > On Thu, Jul 21, 2011 at 05:28:42PM +0100, Mel Gorman wrote:
>> > > > Note how preventing kswapd reclaiming dirty pages pushes up its CPU
>>
>> <snip>
>>
>> > > > usage as it scans more pages but it does not get excessive due to
>> > > > the throttling.
>> > >
>> > > Good to hear.
>> > > The concern of this patchset was early OOM kill with too many scanning.
>> > > I can throw such concern out from now on.
>> > >
>> >
>> > At least, I haven't been able to trigger a premature OOM.
>>
>> AFAIR, Andrew had a premature OOM problem[1] but I couldn't track down at that time.
>> I think this patch series might solve his problem. Although it doesn't, it should not accelerate
>> his problem, at least.
>>
>> Andrew, Could you test this patchset?
Gladly, but not until Wednesday most likely. I'm defending my thesis
on Monday :)
--Andy
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/8] mm: vmscan: Do not writeback filesystem pages in direct reclaim
2011-07-21 16:28 ` [PATCH 1/8] mm: vmscan: Do not writeback filesystem pages in direct reclaim Mel Gorman
@ 2011-07-31 15:06 ` Minchan Kim
2011-08-02 11:21 ` Mel Gorman
0 siblings, 1 reply; 43+ messages in thread
From: Minchan Kim @ 2011-07-31 15:06 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel
On Thu, Jul 21, 2011 at 05:28:43PM +0100, Mel Gorman wrote:
> From: Mel Gorman <mel@csn.ul.ie>
>
> When kswapd is failing to keep zones above the min watermark, a process
> will enter direct reclaim in the same manner kswapd does. If a dirty
> page is encountered during the scan, this page is written to backing
> storage using mapping->writepage.
>
> This causes two problems. First, it can result in very deep call
> stacks, particularly if the target storage or filesystem are complex.
> Some filesystems ignore write requests from direct reclaim as a result.
> The second is that a single-page flush is inefficient in terms of IO.
> While there is an expectation that the elevator will merge requests,
> this does not always happen. Quoting Christoph Hellwig;
>
> The elevator has a relatively small window it can operate on,
> and can never fix up a bad large scale writeback pattern.
>
> This patch prevents direct reclaim writing back filesystem pages by
> checking if current is kswapd. Anonymous pages are still written to
> swap as there is not the equivalent of a flusher thread for anonymous
> pages. If the dirty pages cannot be written back, they are placed
> back on the LRU lists. There is now a direct dependency on dirty page
> balancing to prevent too many pages in the system being dirtied which
> would prevent reclaim making forward progress.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Nitpick.
We can change description of should_reclaim_stall.
"Returns true if the caller should wait to clean dirty/writeback pages"
->
"Returns true if direct reclaimer should wait to clean writeback pages"
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 5/8] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority
2011-07-21 16:28 ` [PATCH 5/8] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority Mel Gorman
@ 2011-07-31 15:11 ` Minchan Kim
0 siblings, 0 replies; 43+ messages in thread
From: Minchan Kim @ 2011-07-31 15:11 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel
On Thu, Jul 21, 2011 at 05:28:47PM +0100, Mel Gorman wrote:
> It is preferable that no dirty pages are dispatched for cleaning from
> the page reclaim path. At normal priorities, this patch prevents kswapd
> writing pages.
>
> However, page reclaim does have a requirement that pages be freed
> in a particular zone. If it is failing to make sufficient progress
> (reclaiming < SWAP_CLUSTER_MAX at any priority priority), the priority
> is raised to scan more pages. A priority of DEF_PRIORITY - 3 is
> considered to tbe the point where kswapd is getting into trouble
> reclaiming pages. If this priority is reached, kswapd will dispatch
> pages for writing.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 6/8] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
2011-07-21 16:28 ` [PATCH 6/8] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback Mel Gorman
@ 2011-07-31 15:17 ` Minchan Kim
2011-08-03 11:19 ` Johannes Weiner
1 sibling, 0 replies; 43+ messages in thread
From: Minchan Kim @ 2011-07-31 15:17 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel
On Thu, Jul 21, 2011 at 05:28:48PM +0100, Mel Gorman wrote:
> Workloads that are allocating frequently and writing files place a
> large number of dirty pages on the LRU. With use-once logic, it is
> possible for them to reach the end of the LRU quickly requiring the
> reclaimer to scan more to find clean pages. Ordinarily, processes that
> are dirtying memory will get throttled by dirty balancing but this
> is a global heuristic and does not take into account that LRUs are
> maintained on a per-zone basis. This can lead to a situation whereby
> reclaim is scanning heavily, skipping over a large number of pages
> under writeback and recycling them around the LRU consuming CPU.
>
> This patch checks how many of the number of pages isolated from the
> LRU were dirty. If a percentage of them are dirty, the process will be
> throttled if a blocking device is congested or the zone being scanned
> is marked congested. The percentage that must be dirty depends on
> the priority. At default priority, all of them must be dirty. At
> DEF_PRIORITY-1, 50% of them must be dirty, DEF_PRIORITY-2, 25%
> etc. i.e. as pressure increases the greater the likelihood the process
> will get throttled to allow the flusher threads to make some progress.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/8] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
2011-07-22 13:23 ` Mel Gorman
@ 2011-07-31 15:24 ` Minchan Kim
2011-08-02 11:25 ` Mel Gorman
0 siblings, 1 reply; 43+ messages in thread
From: Minchan Kim @ 2011-07-31 15:24 UTC (permalink / raw)
To: Mel Gorman
Cc: Peter Zijlstra, Linux-MM, LKML, XFS, Dave Chinner,
Christoph Hellwig, Johannes Weiner, Wu Fengguang, Jan Kara,
Rik van Riel
On Fri, Jul 22, 2011 at 02:23:19PM +0100, Mel Gorman wrote:
> On Fri, Jul 22, 2011 at 02:53:48PM +0200, Peter Zijlstra wrote:
> > On Thu, 2011-07-21 at 17:28 +0100, Mel Gorman wrote:
> > > When direct reclaim encounters a dirty page, it gets recycled around
> > > the LRU for another cycle. This patch marks the page PageReclaim
> > > similar to deactivate_page() so that the page gets reclaimed almost
> > > immediately after the page gets cleaned. This is to avoid reclaiming
> > > clean pages that are younger than a dirty page encountered at the
> > > end of the LRU that might have been something like a use-once page.
> > >
> >
> > > @@ -834,7 +834,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> > > */
> > > if (page_is_file_cache(page) &&
> > > (!current_is_kswapd() || priority >= DEF_PRIORITY - 2)) {
> > > - inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
> > > + /*
> > > + * Immediately reclaim when written back.
> > > + * Similar in principal to deactivate_page()
> > > + * except we already have the page isolated
> > > + * and know it's dirty
> > > + */
> > > + inc_zone_page_state(page, NR_VMSCAN_INVALIDATE);
> > > + SetPageReclaim(page);
> > > +
> >
> > I find the invalidate name somewhat confusing. It makes me think we'll
> > drop the page without writeback, like invalidatepage().
>
> I wasn't that happy with it either to be honest but didn't think of a
> better one at the time. nr_reclaim_deferred?
How about "NR_VMSCAN_IMMEDIATE_RECLAIM" like comment rotate_reclaimable_page?
>
> --
> Mel Gorman
> SUSE Labs
--
Kind regards,
Minchan Kim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/8] mm: vmscan: Do not writeback filesystem pages in direct reclaim
2011-07-31 15:06 ` Minchan Kim
@ 2011-08-02 11:21 ` Mel Gorman
0 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2011-08-02 11:21 UTC (permalink / raw)
To: Minchan Kim
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Johannes Weiner, Wu Fengguang, Jan Kara, Rik van Riel
On Mon, Aug 01, 2011 at 12:06:06AM +0900, Minchan Kim wrote:
> On Thu, Jul 21, 2011 at 05:28:43PM +0100, Mel Gorman wrote:
> > From: Mel Gorman <mel@csn.ul.ie>
> >
> > When kswapd is failing to keep zones above the min watermark, a process
> > will enter direct reclaim in the same manner kswapd does. If a dirty
> > page is encountered during the scan, this page is written to backing
> > storage using mapping->writepage.
> >
> > This causes two problems. First, it can result in very deep call
> > stacks, particularly if the target storage or filesystem are complex.
> > Some filesystems ignore write requests from direct reclaim as a result.
> > The second is that a single-page flush is inefficient in terms of IO.
> > While there is an expectation that the elevator will merge requests,
> > this does not always happen. Quoting Christoph Hellwig;
> >
> > The elevator has a relatively small window it can operate on,
> > and can never fix up a bad large scale writeback pattern.
> >
> > This patch prevents direct reclaim writing back filesystem pages by
> > checking if current is kswapd. Anonymous pages are still written to
> > swap as there is not the equivalent of a flusher thread for anonymous
> > pages. If the dirty pages cannot be written back, they are placed
> > back on the LRU lists. There is now a direct dependency on dirty page
> > balancing to prevent too many pages in the system being dirtied which
> > would prevent reclaim making forward progress.
> >
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
>
Thanks
> Nitpick.
> We can change description of should_reclaim_stall.
>
> "Returns true if the caller should wait to clean dirty/writeback pages"
> ->
> "Returns true if direct reclaimer should wait to clean writeback pages"
>
Not a nitpick. At least one check for RECLAIM_MODE_SYNC is no longer
reachable. I've added a new patch that updates the comment and has
synchronous direct reclaim wait on pages under writeback.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/8] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
2011-07-31 15:24 ` Minchan Kim
@ 2011-08-02 11:25 ` Mel Gorman
0 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2011-08-02 11:25 UTC (permalink / raw)
To: Minchan Kim
Cc: Peter Zijlstra, Linux-MM, LKML, XFS, Dave Chinner,
Christoph Hellwig, Johannes Weiner, Wu Fengguang, Jan Kara,
Rik van Riel
On Mon, Aug 01, 2011 at 12:24:01AM +0900, Minchan Kim wrote:
> On Fri, Jul 22, 2011 at 02:23:19PM +0100, Mel Gorman wrote:
> > On Fri, Jul 22, 2011 at 02:53:48PM +0200, Peter Zijlstra wrote:
> > > On Thu, 2011-07-21 at 17:28 +0100, Mel Gorman wrote:
> > > > When direct reclaim encounters a dirty page, it gets recycled around
> > > > the LRU for another cycle. This patch marks the page PageReclaim
> > > > similar to deactivate_page() so that the page gets reclaimed almost
> > > > immediately after the page gets cleaned. This is to avoid reclaiming
> > > > clean pages that are younger than a dirty page encountered at the
> > > > end of the LRU that might have been something like a use-once page.
> > > >
> > >
> > > > @@ -834,7 +834,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> > > > */
> > > > if (page_is_file_cache(page) &&
> > > > (!current_is_kswapd() || priority >= DEF_PRIORITY - 2)) {
> > > > - inc_zone_page_state(page, NR_VMSCAN_WRITE_SKIP);
> > > > + /*
> > > > + * Immediately reclaim when written back.
> > > > + * Similar in principal to deactivate_page()
> > > > + * except we already have the page isolated
> > > > + * and know it's dirty
> > > > + */
> > > > + inc_zone_page_state(page, NR_VMSCAN_INVALIDATE);
> > > > + SetPageReclaim(page);
> > > > +
> > >
> > > I find the invalidate name somewhat confusing. It makes me think we'll
> > > drop the page without writeback, like invalidatepage().
> >
> > I wasn't that happy with it either to be honest but didn't think of a
> > better one at the time. nr_reclaim_deferred?
>
> How about "NR_VMSCAN_IMMEDIATE_RECLAIM" like comment rotate_reclaimable_page?
>
Yeah, I guess. I find it a little misleading because the reclaim does
not happen immediately at the time the counter is incremented but it's
better than "invalidate".
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/8] ext4: Warn if direct reclaim tries to writeback pages
2011-07-21 16:28 ` [PATCH 3/8] ext4: " Mel Gorman
@ 2011-08-03 10:58 ` Johannes Weiner
2011-08-03 11:06 ` Johannes Weiner
0 siblings, 1 reply; 43+ messages in thread
From: Johannes Weiner @ 2011-08-03 10:58 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim
On Thu, Jul 21, 2011 at 05:28:45PM +0100, Mel Gorman wrote:
> Direct reclaim should never writeback pages. Warn if an attempt
> is made.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <jweiner@redhat.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/8] ext4: Warn if direct reclaim tries to writeback pages
2011-08-03 10:58 ` Johannes Weiner
@ 2011-08-03 11:06 ` Johannes Weiner
2011-08-03 13:44 ` Mel Gorman
0 siblings, 1 reply; 43+ messages in thread
From: Johannes Weiner @ 2011-08-03 11:06 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim
On Wed, Aug 03, 2011 at 12:58:19PM +0200, Johannes Weiner wrote:
> On Thu, Jul 21, 2011 at 05:28:45PM +0100, Mel Gorman wrote:
> > Direct reclaim should never writeback pages. Warn if an attempt
> > is made.
> >
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
>
> Acked-by: Johannes Weiner <jweiner@redhat.com>
Oops, too fast.
Shouldn't the WARN_ON() be at the top of the function, rather than
just warn when the write is deferred due to delalloc?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 4/8] btrfs: Warn if direct reclaim tries to writeback pages
2011-07-21 16:28 ` [PATCH 4/8] btrfs: " Mel Gorman
@ 2011-08-03 11:10 ` Johannes Weiner
2011-08-03 13:45 ` Mel Gorman
0 siblings, 1 reply; 43+ messages in thread
From: Johannes Weiner @ 2011-08-03 11:10 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim
On Thu, Jul 21, 2011 at 05:28:46PM +0100, Mel Gorman wrote:
> Direct reclaim should never writeback pages. Warn if an attempt is
> made. By rights, btrfs should be allowing writepage from kswapd if
> it is failing to reclaim pages by any other means but it's outside
> the scope of this patch.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
> fs/btrfs/disk-io.c | 2 ++
> fs/btrfs/inode.c | 2 ++
> 2 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 1ac8db5d..cc9c9cf 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -829,6 +829,8 @@ static int btree_writepage(struct page *page, struct writeback_control *wbc)
>
> tree = &BTRFS_I(page->mapping->host)->io_tree;
> if (!(current->flags & PF_MEMALLOC)) {
> + WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
> + PF_MEMALLOC);
Since it is branch for PF_MEMALLOC being set, why not just
WARN_ON_ONCE(!(current->flags & PF_KSWAPD)) instead?
Minor nitpick, though, and I can understand if you just want to have
the conditionals be the same in every fs.
Acked-by: Johannes Weiner <jweiner@redhat.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 6/8] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
2011-07-21 16:28 ` [PATCH 6/8] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback Mel Gorman
2011-07-31 15:17 ` Minchan Kim
@ 2011-08-03 11:19 ` Johannes Weiner
2011-08-03 13:56 ` Mel Gorman
1 sibling, 1 reply; 43+ messages in thread
From: Johannes Weiner @ 2011-08-03 11:19 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim
On Thu, Jul 21, 2011 at 05:28:48PM +0100, Mel Gorman wrote:
> Workloads that are allocating frequently and writing files place a
> large number of dirty pages on the LRU. With use-once logic, it is
> possible for them to reach the end of the LRU quickly requiring the
> reclaimer to scan more to find clean pages. Ordinarily, processes that
> are dirtying memory will get throttled by dirty balancing but this
> is a global heuristic and does not take into account that LRUs are
> maintained on a per-zone basis. This can lead to a situation whereby
> reclaim is scanning heavily, skipping over a large number of pages
> under writeback and recycling them around the LRU consuming CPU.
>
> This patch checks how many of the number of pages isolated from the
> LRU were dirty. If a percentage of them are dirty, the process will be
> throttled if a blocking device is congested or the zone being scanned
> is marked congested. The percentage that must be dirty depends on
> the priority. At default priority, all of them must be dirty. At
> DEF_PRIORITY-1, 50% of them must be dirty, DEF_PRIORITY-2, 25%
> etc. i.e. as pressure increases the greater the likelihood the process
> will get throttled to allow the flusher threads to make some progress.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
> mm/vmscan.c | 21 ++++++++++++++++++---
> 1 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index cf7b501..b0060f8 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -720,7 +720,8 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
> static unsigned long shrink_page_list(struct list_head *page_list,
> struct zone *zone,
> struct scan_control *sc,
> - int priority)
> + int priority,
> + unsigned long *ret_nr_dirty)
> {
> LIST_HEAD(ret_pages);
> LIST_HEAD(free_pages);
> @@ -971,6 +972,7 @@ keep_lumpy:
>
> list_splice(&ret_pages, page_list);
> count_vm_events(PGACTIVATE, pgactivate);
> + *ret_nr_dirty += nr_dirty;
Note that this includes anon pages, which means that swapping is
throttled as well.
I don't think it is a downside to throttle swapping during IO
congestion - waiting for pages under writeback to become reclaimable
is better than kicking off even more IO in this case as well - but the
changelog and the comments should include it, I guess.
Otherwise,
Acked-by: Johannes Weiner <jweiner@redhat.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/8] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
2011-07-21 16:28 ` [PATCH 7/8] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes Mel Gorman
2011-07-22 12:53 ` Peter Zijlstra
@ 2011-08-03 11:26 ` Johannes Weiner
2011-08-03 13:57 ` Mel Gorman
1 sibling, 1 reply; 43+ messages in thread
From: Johannes Weiner @ 2011-08-03 11:26 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim
On Thu, Jul 21, 2011 at 05:28:49PM +0100, Mel Gorman wrote:
> When direct reclaim encounters a dirty page, it gets recycled around
> the LRU for another cycle. This patch marks the page PageReclaim
> similar to deactivate_page() so that the page gets reclaimed almost
> immediately after the page gets cleaned. This is to avoid reclaiming
> clean pages that are younger than a dirty page encountered at the
> end of the LRU that might have been something like a use-once page.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
Apart from the naming of the counter (I like nr_reclaim_preferred),
Acked-by: Johannes Weiner <jweiner@redhat.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 8/8] mm: vmscan: Do not writeback filesystem pages from kswapd
2011-07-21 16:28 ` [PATCH 8/8] mm: vmscan: Do not writeback filesystem pages from kswapd Mel Gorman
2011-07-22 12:57 ` Peter Zijlstra
@ 2011-08-03 11:37 ` Johannes Weiner
2011-08-03 13:58 ` Mel Gorman
1 sibling, 1 reply; 43+ messages in thread
From: Johannes Weiner @ 2011-08-03 11:37 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim
On Thu, Jul 21, 2011 at 05:28:50PM +0100, Mel Gorman wrote:
> Assuming that flusher threads will always write back dirty pages promptly
> then it is always faster for reclaimers to wait for flushers. This patch
> prevents kswapd writing back any filesystem pages.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
Relying on the flushers may mean that every dirty page in the system
has to be written back before the pages from the zone of interest are
clean.
De-facto we have only one mechanism to stay on top of the dirty pages
from a per-zone perspective, and that is single-page writeout from
reclaim.
While we all agree that this sucks, we can not remove it unless we
have a replacement that makes zones reclaimable in a reasonable time
frame (or keep them reclaimable in the first place, what per-zone
dirty limits attempt to do).
As such, please include
Nacked-by: Johannes Weiner <jweiner@redhat.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/8] ext4: Warn if direct reclaim tries to writeback pages
2011-08-03 11:06 ` Johannes Weiner
@ 2011-08-03 13:44 ` Mel Gorman
2011-08-03 14:00 ` Johannes Weiner
0 siblings, 1 reply; 43+ messages in thread
From: Mel Gorman @ 2011-08-03 13:44 UTC (permalink / raw)
To: Johannes Weiner
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim
On Wed, Aug 03, 2011 at 01:06:29PM +0200, Johannes Weiner wrote:
> On Wed, Aug 03, 2011 at 12:58:19PM +0200, Johannes Weiner wrote:
> > On Thu, Jul 21, 2011 at 05:28:45PM +0100, Mel Gorman wrote:
> > > Direct reclaim should never writeback pages. Warn if an attempt
> > > is made.
> > >
> > > Signed-off-by: Mel Gorman <mgorman@suse.de>
> >
> > Acked-by: Johannes Weiner <jweiner@redhat.com>
>
> Oops, too fast.
>
> Shouldn't the WARN_ON() be at the top of the function, rather than
> just warn when the write is deferred due to delalloc?
I thought it made more sense to put the warning at the point where ext4
would normally ignore ->writepage.
That said, in my current revision of the series, I've dropped these
patches altogether as page migration should be able to trigger the same
warnings but be called from paths that are of less concern for stack
overflows (or at the very least be looked at as a separate series).
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 4/8] btrfs: Warn if direct reclaim tries to writeback pages
2011-08-03 11:10 ` Johannes Weiner
@ 2011-08-03 13:45 ` Mel Gorman
0 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2011-08-03 13:45 UTC (permalink / raw)
To: Johannes Weiner
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim
On Wed, Aug 03, 2011 at 01:10:31PM +0200, Johannes Weiner wrote:
> On Thu, Jul 21, 2011 at 05:28:46PM +0100, Mel Gorman wrote:
> > Direct reclaim should never writeback pages. Warn if an attempt is
> > made. By rights, btrfs should be allowing writepage from kswapd if
> > it is failing to reclaim pages by any other means but it's outside
> > the scope of this patch.
> >
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > ---
> > fs/btrfs/disk-io.c | 2 ++
> > fs/btrfs/inode.c | 2 ++
> > 2 files changed, 4 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index 1ac8db5d..cc9c9cf 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -829,6 +829,8 @@ static int btree_writepage(struct page *page, struct writeback_control *wbc)
> >
> > tree = &BTRFS_I(page->mapping->host)->io_tree;
> > if (!(current->flags & PF_MEMALLOC)) {
> > + WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
> > + PF_MEMALLOC);
>
> Since it is branch for PF_MEMALLOC being set, why not just
> WARN_ON_ONCE(!(current->flags & PF_KSWAPD)) instead?
>
> Minor nitpick, though, and I can understand if you just want to have
> the conditionals be the same in every fs.
>
It was just copying the conditionals for the other FS although I admit
your version would look nicer.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 6/8] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback
2011-08-03 11:19 ` Johannes Weiner
@ 2011-08-03 13:56 ` Mel Gorman
0 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2011-08-03 13:56 UTC (permalink / raw)
To: Johannes Weiner
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim
On Wed, Aug 03, 2011 at 01:19:40PM +0200, Johannes Weiner wrote:
> On Thu, Jul 21, 2011 at 05:28:48PM +0100, Mel Gorman wrote:
> > Workloads that are allocating frequently and writing files place a
> > large number of dirty pages on the LRU. With use-once logic, it is
> > possible for them to reach the end of the LRU quickly requiring the
> > reclaimer to scan more to find clean pages. Ordinarily, processes that
> > are dirtying memory will get throttled by dirty balancing but this
> > is a global heuristic and does not take into account that LRUs are
> > maintained on a per-zone basis. This can lead to a situation whereby
> > reclaim is scanning heavily, skipping over a large number of pages
> > under writeback and recycling them around the LRU consuming CPU.
> >
> > This patch checks how many of the number of pages isolated from the
> > LRU were dirty. If a percentage of them are dirty, the process will be
> > throttled if a blocking device is congested or the zone being scanned
> > is marked congested. The percentage that must be dirty depends on
> > the priority. At default priority, all of them must be dirty. At
> > DEF_PRIORITY-1, 50% of them must be dirty, DEF_PRIORITY-2, 25%
> > etc. i.e. as pressure increases the greater the likelihood the process
> > will get throttled to allow the flusher threads to make some progress.
> >
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > ---
> > mm/vmscan.c | 21 ++++++++++++++++++---
> > 1 files changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index cf7b501..b0060f8 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -720,7 +720,8 @@ static noinline_for_stack void free_page_list(struct list_head *free_pages)
> > static unsigned long shrink_page_list(struct list_head *page_list,
> > struct zone *zone,
> > struct scan_control *sc,
> > - int priority)
> > + int priority,
> > + unsigned long *ret_nr_dirty)
> > {
> > LIST_HEAD(ret_pages);
> > LIST_HEAD(free_pages);
> > @@ -971,6 +972,7 @@ keep_lumpy:
> >
> > list_splice(&ret_pages, page_list);
> > count_vm_events(PGACTIVATE, pgactivate);
> > + *ret_nr_dirty += nr_dirty;
>
> Note that this includes anon pages, which means that swapping is
> throttled as well.
>
Yes it does. In the current revision of the series, I'm not using
nr_dirty as it throttles too aggressively. Instead the number of pages
under writeback is counted and that is used for the throttling decision.
It still potentially includes anon pages but that is reasonable.
> I don't think it is a downside to throttle swapping during IO
> congestion - waiting for pages under writeback to become reclaimable
> is better than kicking off even more IO in this case as well - but the
> changelog and the comments should include it, I guess.
>
Fair point. I've updated the changelog accordingly. Thanks.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 7/8] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes
2011-08-03 11:26 ` Johannes Weiner
@ 2011-08-03 13:57 ` Mel Gorman
0 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2011-08-03 13:57 UTC (permalink / raw)
To: Johannes Weiner
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim
On Wed, Aug 03, 2011 at 01:26:30PM +0200, Johannes Weiner wrote:
> On Thu, Jul 21, 2011 at 05:28:49PM +0100, Mel Gorman wrote:
> > When direct reclaim encounters a dirty page, it gets recycled around
> > the LRU for another cycle. This patch marks the page PageReclaim
> > similar to deactivate_page() so that the page gets reclaimed almost
> > immediately after the page gets cleaned. This is to avoid reclaiming
> > clean pages that are younger than a dirty page encountered at the
> > end of the LRU that might have been something like a use-once page.
> >
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
>
> Apart from the naming of the counter (I like nr_reclaim_preferred),
>
At the moment it's NR_VMSCAN_IMMEDIATE and the name visible in
/proc/vmstat is nr_vmscan_immediate_reclaim
> Acked-by: Johannes Weiner <jweiner@redhat.com>
Thanks.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 8/8] mm: vmscan: Do not writeback filesystem pages from kswapd
2011-08-03 11:37 ` Johannes Weiner
@ 2011-08-03 13:58 ` Mel Gorman
0 siblings, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2011-08-03 13:58 UTC (permalink / raw)
To: Johannes Weiner
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim
On Wed, Aug 03, 2011 at 01:37:06PM +0200, Johannes Weiner wrote:
> On Thu, Jul 21, 2011 at 05:28:50PM +0100, Mel Gorman wrote:
> > Assuming that flusher threads will always write back dirty pages promptly
> > then it is always faster for reclaimers to wait for flushers. This patch
> > prevents kswapd writing back any filesystem pages.
> >
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
>
> Relying on the flushers may mean that every dirty page in the system
> has to be written back before the pages from the zone of interest are
> clean.
>
Yes.
> De-facto we have only one mechanism to stay on top of the dirty pages
> from a per-zone perspective, and that is single-page writeout from
> reclaim.
>
Yes.
> While we all agree that this sucks, we can not remove it unless we
> have a replacement that makes zones reclaimable in a reasonable time
> frame (or keep them reclaimable in the first place, what per-zone
> dirty limits attempt to do).
>
> As such, please include
>
> Nacked-by: Johannes Weiner <jweiner@redhat.com>
I've already dropped the patch. If I could, I would have signed this at
the time as
Signed-off-but-naking-it-anyway: Mel Gorman <mgorman@suse.de
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/8] ext4: Warn if direct reclaim tries to writeback pages
2011-08-03 13:44 ` Mel Gorman
@ 2011-08-03 14:00 ` Johannes Weiner
2011-08-03 14:18 ` Christoph Hellwig
2011-08-03 14:35 ` Mel Gorman
0 siblings, 2 replies; 43+ messages in thread
From: Johannes Weiner @ 2011-08-03 14:00 UTC (permalink / raw)
To: Mel Gorman
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim
On Wed, Aug 03, 2011 at 02:44:20PM +0100, Mel Gorman wrote:
> On Wed, Aug 03, 2011 at 01:06:29PM +0200, Johannes Weiner wrote:
> > On Wed, Aug 03, 2011 at 12:58:19PM +0200, Johannes Weiner wrote:
> > > On Thu, Jul 21, 2011 at 05:28:45PM +0100, Mel Gorman wrote:
> > > > Direct reclaim should never writeback pages. Warn if an attempt
> > > > is made.
> > > >
> > > > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > >
> > > Acked-by: Johannes Weiner <jweiner@redhat.com>
> >
> > Oops, too fast.
> >
> > Shouldn't the WARN_ON() be at the top of the function, rather than
> > just warn when the write is deferred due to delalloc?
>
> I thought it made more sense to put the warning at the point where ext4
> would normally ignore ->writepage.
>
> That said, in my current revision of the series, I've dropped these
> patches altogether as page migration should be able to trigger the same
> warnings but be called from paths that are of less concern for stack
> overflows (or at the very least be looked at as a separate series).
Doesn't this only apply to btrfs which has no own .migratepage aop for
file pages? The others use buffer_migrate_page.
But if you dropped them anyway, it does not matter :)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/8] ext4: Warn if direct reclaim tries to writeback pages
2011-08-03 14:00 ` Johannes Weiner
@ 2011-08-03 14:18 ` Christoph Hellwig
2011-08-03 14:35 ` Mel Gorman
1 sibling, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2011-08-03 14:18 UTC (permalink / raw)
To: Johannes Weiner
Cc: Mel Gorman, Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim, josef
On Wed, Aug 03, 2011 at 04:00:19PM +0200, Johannes Weiner wrote:
> > That said, in my current revision of the series, I've dropped these
> > patches altogether as page migration should be able to trigger the same
> > warnings but be called from paths that are of less concern for stack
> > overflows (or at the very least be looked at as a separate series).
>
> Doesn't this only apply to btrfs which has no own .migratepage aop for
> file pages? The others use buffer_migrate_page.
>
> But if you dropped them anyway, it does not matter :)
Note that the mid-term plan is to kill ->writepage as an address space
method. Besides the usage from reclaim as as callbacks to
write_cache_pages and write_one_page (which can be made explicit
arguments) the only remaining user is the above mentioned fallback.
Josef, any chance you could switch btrfs over to implement a proper
->migratepage?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/8] ext4: Warn if direct reclaim tries to writeback pages
2011-08-03 14:00 ` Johannes Weiner
2011-08-03 14:18 ` Christoph Hellwig
@ 2011-08-03 14:35 ` Mel Gorman
1 sibling, 0 replies; 43+ messages in thread
From: Mel Gorman @ 2011-08-03 14:35 UTC (permalink / raw)
To: Johannes Weiner
Cc: Linux-MM, LKML, XFS, Dave Chinner, Christoph Hellwig,
Wu Fengguang, Jan Kara, Rik van Riel, Minchan Kim
On Wed, Aug 03, 2011 at 04:00:19PM +0200, Johannes Weiner wrote:
> On Wed, Aug 03, 2011 at 02:44:20PM +0100, Mel Gorman wrote:
> > On Wed, Aug 03, 2011 at 01:06:29PM +0200, Johannes Weiner wrote:
> > > On Wed, Aug 03, 2011 at 12:58:19PM +0200, Johannes Weiner wrote:
> > > > On Thu, Jul 21, 2011 at 05:28:45PM +0100, Mel Gorman wrote:
> > > > > Direct reclaim should never writeback pages. Warn if an attempt
> > > > > is made.
> > > > >
> > > > > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > > >
> > > > Acked-by: Johannes Weiner <jweiner@redhat.com>
> > >
> > > Oops, too fast.
> > >
> > > Shouldn't the WARN_ON() be at the top of the function, rather than
> > > just warn when the write is deferred due to delalloc?
> >
> > I thought it made more sense to put the warning at the point where ext4
> > would normally ignore ->writepage.
> >
> > That said, in my current revision of the series, I've dropped these
> > patches altogether as page migration should be able to trigger the same
> > warnings but be called from paths that are of less concern for stack
> > overflows (or at the very least be looked at as a separate series).
>
> Doesn't this only apply to btrfs which has no own .migratepage aop for
> file pages? The others use buffer_migrate_page.
>
Bah, you're right. It was btrfs I was looking at during the time I
decided to drop the patches and I didn't think it through. I only
needed to drop the btrfs one.
> But if you dropped them anyway, it does not matter :)
I put back in the xfs and ext4 checks. The ext4 check is still in the
same place.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2011-08-03 14:35 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-21 16:28 [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2 Mel Gorman
2011-07-21 16:28 ` [PATCH 1/8] mm: vmscan: Do not writeback filesystem pages in direct reclaim Mel Gorman
2011-07-31 15:06 ` Minchan Kim
2011-08-02 11:21 ` Mel Gorman
2011-07-21 16:28 ` [PATCH 2/8] xfs: Warn if direct reclaim tries to writeback pages Mel Gorman
2011-07-24 11:32 ` Christoph Hellwig
2011-07-25 8:19 ` Mel Gorman
2011-07-21 16:28 ` [PATCH 3/8] ext4: " Mel Gorman
2011-08-03 10:58 ` Johannes Weiner
2011-08-03 11:06 ` Johannes Weiner
2011-08-03 13:44 ` Mel Gorman
2011-08-03 14:00 ` Johannes Weiner
2011-08-03 14:18 ` Christoph Hellwig
2011-08-03 14:35 ` Mel Gorman
2011-07-21 16:28 ` [PATCH 4/8] btrfs: " Mel Gorman
2011-08-03 11:10 ` Johannes Weiner
2011-08-03 13:45 ` Mel Gorman
2011-07-21 16:28 ` [PATCH 5/8] mm: vmscan: Do not writeback filesystem pages in kswapd except in high priority Mel Gorman
2011-07-31 15:11 ` Minchan Kim
2011-07-21 16:28 ` [PATCH 6/8] mm: vmscan: Throttle reclaim if encountering too many dirty pages under writeback Mel Gorman
2011-07-31 15:17 ` Minchan Kim
2011-08-03 11:19 ` Johannes Weiner
2011-08-03 13:56 ` Mel Gorman
2011-07-21 16:28 ` [PATCH 7/8] mm: vmscan: Immediately reclaim end-of-LRU dirty pages when writeback completes Mel Gorman
2011-07-22 12:53 ` Peter Zijlstra
2011-07-22 13:23 ` Mel Gorman
2011-07-31 15:24 ` Minchan Kim
2011-08-02 11:25 ` Mel Gorman
2011-08-03 11:26 ` Johannes Weiner
2011-08-03 13:57 ` Mel Gorman
2011-07-21 16:28 ` [PATCH 8/8] mm: vmscan: Do not writeback filesystem pages from kswapd Mel Gorman
2011-07-22 12:57 ` Peter Zijlstra
2011-07-22 13:31 ` Mel Gorman
2011-08-03 11:37 ` Johannes Weiner
2011-08-03 13:58 ` Mel Gorman
2011-07-26 11:20 ` [RFC PATCH 0/8] Reduce filesystem writeback from page reclaim v2 Dave Chinner
2011-07-27 4:32 ` Minchan Kim
2011-07-27 7:37 ` Mel Gorman
2011-07-27 16:18 ` Minchan Kim
2011-07-28 11:38 ` Mel Gorman
2011-07-29 9:48 ` Minchan Kim
2011-07-29 9:50 ` Minchan Kim
2011-07-29 13:41 ` Andrew Lutomirski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).