linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible
@ 2010-06-08  9:02 Mel Gorman
  2010-06-08  9:02 ` [PATCH 1/6] tracing, vmscan: Add trace events for kswapd wakeup, sleeping and direct reclaim Mel Gorman
                   ` (9 more replies)
  0 siblings, 10 replies; 67+ messages in thread
From: Mel Gorman @ 2010-06-08  9:02 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, linux-mm
  Cc: Dave Chinner, Chris Mason, Nick Piggin, Rik van Riel, Mel Gorman

I finally got a chance last week to visit the topic of direct reclaim
avoiding the writing out pages. As it came up during discussions the last
time, I also had a stab at making the VM writing ranges of pages instead
of individual pages. I am not proposing for merging yet until I want to see
what people think of this general direction and if we can agree on if this
is the right one or not.

To summarise, there are two big problems with page reclaim right now. The
first is that page reclaim uses a_op->writepage to write a back back
under the page lock which is inefficient from an IO perspective due to
seeky patterns.  The second is that direct reclaim calling the filesystem
splices two potentially deep call paths together and potentially overflows
the stack on complex storage or filesystems. This series is an early draft
at tackling both of these problems and is in three stages.

The first 4 patches are a forward-port of trace points that are partly
based on trace points defined by Larry Woodman but never merged. They trace
parts of kswapd, direct reclaim, LRU page isolation and page writeback. The
tracepoints can be used to evaluate what is happening within reclaim and
whether things are getting better or worse. They do not have to be part of
the final series but might be useful during discussion.

Patch 5 writes out contiguous ranges of pages where possible using
a_ops->writepages. When writing a range, the inode is pinned and the page
lock released before submitting to writepages(). This potentially generates
a better IO pattern and it should avoid a lock inversion problem within the
filesystem that wants the same page lock held by the VM. The downside with
writing ranges is that the VM may not be generating more IO than necessary.

Patch 6 prevents direct reclaim writing out pages at all and instead dirty
pages are put back on the LRU. For lumpy reclaim, the caller will briefly
wait on dirty pages to be written out before trying to reclaim the dirty
pages a second time.

The last patch increases the responsibility of kswapd somewhat because
it's now cleaning pages on behalf of direct reclaimers but kswapd seemed
a better fit than background flushers to clean pages as it knows where the
pages needing cleaning are. As it's async IO, it should not cause kswapd to
stall (at least until the queue is congested) but the order that pages are
reclaimed on the LRU is altered. Dirty pages that would have been reclaimed
by direct reclaimers are getting another lap on the LRU. The dirty pages
could have been put on a dedicated list but this increased counter overhead
and the number of lists and it is unclear if it is necessary.

The series has survived performance and stress testing, particularly around
high-order allocations on X86, X86-64 and PPC64. The results of the tests
showed that while lumpy reclaim has a slightly lower success rate when
allocating huge pages but it was still very acceptable rates, reclaim was
a lot less disruptive and allocation latency was lower.

Comments?

 .../trace/postprocess/trace-vmscan-postprocess.pl  |  623 ++++++++++++++++++++
 include/trace/events/gfpflags.h                    |   37 ++
 include/trace/events/kmem.h                        |   38 +--
 include/trace/events/vmscan.h                      |  184 ++++++
 mm/vmscan.c                                        |  299 ++++++++--
 5 files changed, 1092 insertions(+), 89 deletions(-)
 create mode 100644 Documentation/trace/postprocess/trace-vmscan-postprocess.pl
 create mode 100644 include/trace/events/gfpflags.h
 create mode 100644 include/trace/events/vmscan.h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2010-06-16 17:05 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-08  9:02 [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible Mel Gorman
2010-06-08  9:02 ` [PATCH 1/6] tracing, vmscan: Add trace events for kswapd wakeup, sleeping and direct reclaim Mel Gorman
2010-06-08  9:02 ` [PATCH 2/6] tracing, vmscan: Add trace events for LRU page isolation Mel Gorman
2010-06-08  9:02 ` [PATCH 3/6] tracing, vmscan: Add trace event when a page is written Mel Gorman
2010-06-08  9:02 ` [PATCH 4/6] tracing, vmscan: Add a postprocessing script for reclaim-related ftrace events Mel Gorman
2010-06-08  9:02 ` [PATCH 5/6] vmscan: Write out ranges of pages contiguous to the inode where possible Mel Gorman
2010-06-11  6:10   ` Andrew Morton
2010-06-11 12:49     ` Mel Gorman
2010-06-11 19:07       ` Andrew Morton
2010-06-11 20:44         ` Mel Gorman
2010-06-11 21:33           ` Andrew Morton
2010-06-12  0:17             ` Mel Gorman
2010-06-11 16:27     ` Christoph Hellwig
2010-06-08  9:02 ` [PATCH 6/6] vmscan: Do not writeback pages in direct reclaim Mel Gorman
2010-06-11  6:17   ` Andrew Morton
2010-06-11 12:54     ` Mel Gorman
2010-06-11 16:25     ` Christoph Hellwig
2010-06-11 17:43       ` Andrew Morton
2010-06-11 17:49         ` Christoph Hellwig
2010-06-11 18:13           ` Mel Gorman
2010-06-08  9:08 ` [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaim and use a_ops->writepages() where possible Christoph Hellwig
2010-06-08  9:28   ` Mel Gorman
2010-06-11 16:29     ` Christoph Hellwig
2010-06-11 18:15       ` Mel Gorman
2010-06-11 19:12       ` Chris Mason
2010-06-09  2:52 ` KAMEZAWA Hiroyuki
2010-06-09  9:52   ` Mel Gorman
2010-06-10  0:38     ` KAMEZAWA Hiroyuki
2010-06-10  1:10       ` Mel Gorman
2010-06-10  1:29         ` KAMEZAWA Hiroyuki
2010-06-11  5:57 ` Andrew Morton
2010-06-11 12:33   ` Mel Gorman
2010-06-11 16:30     ` Christoph Hellwig
2010-06-11 18:17       ` Mel Gorman
2010-06-15 14:00 ` Andrea Arcangeli
2010-06-15 14:11   ` Christoph Hellwig
2010-06-15 14:22     ` Andrea Arcangeli
2010-06-15 14:43       ` Christoph Hellwig
2010-06-15 15:08         ` Andrea Arcangeli
2010-06-15 15:25           ` Christoph Hellwig
2010-06-15 15:45             ` Andrea Arcangeli
2010-06-15 16:26               ` Christoph Hellwig
2010-06-15 16:31                 ` Andrea Arcangeli
2010-06-15 16:49                 ` Rik van Riel
2010-06-15 16:54                   ` Christoph Hellwig
2010-06-15 19:13                     ` Rik van Riel
2010-06-15 19:17                       ` Christoph Hellwig
2010-06-15 19:44                         ` Chris Mason
2010-06-16  7:57                       ` Nick Piggin
2010-06-16 16:59                         ` Rik van Riel
2010-06-16 17:04                           ` Andrea Arcangeli
2010-06-15 16:54                   ` Nick Piggin
2010-06-15 15:38           ` Mel Gorman
2010-06-15 16:14             ` Andrea Arcangeli
2010-06-15 16:22               ` Christoph Hellwig
2010-06-15 16:30               ` Mel Gorman
2010-06-15 16:34                 ` Mel Gorman
2010-06-15 16:54                   ` Andrea Arcangeli
2010-06-15 16:35                 ` Christoph Hellwig
2010-06-15 16:37                 ` Andrea Arcangeli
2010-06-15 17:43                   ` Christoph Hellwig
2010-06-15 16:45               ` Christoph Hellwig
2010-06-15 14:51   ` Mel Gorman
2010-06-15 14:55     ` Rik van Riel
2010-06-15 15:08     ` Nick Piggin
2010-06-15 15:10       ` Mel Gorman
2010-06-15 16:28     ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).