From: Dave Chinner <david@fromorbit.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, Chris Mason <chris.mason@oracle.com>,
Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Christoph Hellwig <hch@infradead.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH 11/12] vmscan: Write out dirty pages in batch
Date: Tue, 15 Jun 2010 16:36:43 +1000 [thread overview]
Message-ID: <20100615063643.GS6590@dastard> (raw)
In-Reply-To: <20100614211515.dd9880dc.akpm@linux-foundation.org>
On Mon, Jun 14, 2010 at 09:15:15PM -0700, Andrew Morton wrote:
> On Tue, 15 Jun 2010 13:20:34 +1000 Dave Chinner <david@fromorbit.com> wrote:
>
> > On Mon, Jun 14, 2010 at 06:39:57PM -0700, Andrew Morton wrote:
> > > On Tue, 15 Jun 2010 10:39:43 +1000 Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > >
> > > > IOWs, IMO anywhere there is a context with significant queue of IO,
> > > > that's where we should be doing a better job of sorting before that
> > > > IO is dispatched to the lower layers. This is still no guarantee of
> > > > better IO (e.g. if the filesystem fragments the file) but it does
> > > > give the lower layers a far better chance at optimal allocation and
> > > > scheduling of IO...
> > >
> > > None of what you said had much to do with what I said.
> > >
> > > What you've described are implementation problems in the current block
> > > layer because it conflates "sorting" with "queueing". I'm saying "fix
> > > that".
> >
> > You can't sort until you've queued.
>
> Yes you can. That's exactly what you're recommending!
Umm, I suggested sorting a queue dirty pages that was build by
reclaim before dispatching them. How does that translate to
me recommending "sort before queuing"?
> Only you're
> recommending doing it at the wrong level.
If you feed a filesystem garbage IO, you'll get garbage performance
and there's nothing that a block layer sort queue can do to fix the
damage it does to both performance and filesystem fragmentation
levels. It's not just about IO issue - delayed allocation pretty
much requires writeback to be issuing well formed IOs to reap the
benefits it can provide....
> > > And... sorting at the block layer will always be superior to sorting
> > > at the pagecache layer because the block layer sorts at the physical
> > > block level and can handle not-well-laid-out files and can sort and merge
> > > pages from different address_spaces.
> >
> > Yes it, can do that. And it still does that even if the higher
> > layers sort their I/O dispatch better,
> >
> > Filesystems try very hard to allocate adjacent logical offsets in a
> > file in adjacent physical blocks on disk - that's the whole point of
> > extent-indexed filesystems. Hence with modern filesystems there is
> > generally a direct correlation between the page {mapping,index}
> > tuple and the physical location of the mapped block.
> >
> > i.e. there is generally zero physical correlation between pages in
> > different mappings, but there is a high physical correlation
> > between the index of pages on the same mapping.
>
> Nope. Large-number-of-small-files is a pretty common case. If the fs
> doesn't handle that well (ie: by placing them nearby on disk), it's
> borked.
Filesystems already handle this case just fine as we see it from
writeback all the time. Untarring a kernel is a good example of
this...
I suggested sorting all the IO to be issued into per-mapping page
groups because:
a) makes IO issued from reclaim look almost exactly the same
to the filesytem as if writeback is pushing out the IO.
b) it looks to be a trivial addition to the new code.
To me that's a no-brainer.
> It would be interesting to code up a little test patch though, see if
> there's benefit to be had going down this path.
I doubt Mel's tests cases will show anything - they simply didn't
show enough IO issued from reclaim to make any difference.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, Chris Mason <chris.mason@oracle.com>,
Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Christoph Hellwig <hch@infradead.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [PATCH 11/12] vmscan: Write out dirty pages in batch
Date: Tue, 15 Jun 2010 16:36:43 +1000 [thread overview]
Message-ID: <20100615063643.GS6590@dastard> (raw)
In-Reply-To: <20100614211515.dd9880dc.akpm@linux-foundation.org>
On Mon, Jun 14, 2010 at 09:15:15PM -0700, Andrew Morton wrote:
> On Tue, 15 Jun 2010 13:20:34 +1000 Dave Chinner <david@fromorbit.com> wrote:
>
> > On Mon, Jun 14, 2010 at 06:39:57PM -0700, Andrew Morton wrote:
> > > On Tue, 15 Jun 2010 10:39:43 +1000 Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > >
> > > > IOWs, IMO anywhere there is a context with significant queue of IO,
> > > > that's where we should be doing a better job of sorting before that
> > > > IO is dispatched to the lower layers. This is still no guarantee of
> > > > better IO (e.g. if the filesystem fragments the file) but it does
> > > > give the lower layers a far better chance at optimal allocation and
> > > > scheduling of IO...
> > >
> > > None of what you said had much to do with what I said.
> > >
> > > What you've described are implementation problems in the current block
> > > layer because it conflates "sorting" with "queueing". I'm saying "fix
> > > that".
> >
> > You can't sort until you've queued.
>
> Yes you can. That's exactly what you're recommending!
Umm, I suggested sorting a queue dirty pages that was build by
reclaim before dispatching them. How does that translate to
me recommending "sort before queuing"?
> Only you're
> recommending doing it at the wrong level.
If you feed a filesystem garbage IO, you'll get garbage performance
and there's nothing that a block layer sort queue can do to fix the
damage it does to both performance and filesystem fragmentation
levels. It's not just about IO issue - delayed allocation pretty
much requires writeback to be issuing well formed IOs to reap the
benefits it can provide....
> > > And... sorting at the block layer will always be superior to sorting
> > > at the pagecache layer because the block layer sorts at the physical
> > > block level and can handle not-well-laid-out files and can sort and merge
> > > pages from different address_spaces.
> >
> > Yes it, can do that. And it still does that even if the higher
> > layers sort their I/O dispatch better,
> >
> > Filesystems try very hard to allocate adjacent logical offsets in a
> > file in adjacent physical blocks on disk - that's the whole point of
> > extent-indexed filesystems. Hence with modern filesystems there is
> > generally a direct correlation between the page {mapping,index}
> > tuple and the physical location of the mapped block.
> >
> > i.e. there is generally zero physical correlation between pages in
> > different mappings, but there is a high physical correlation
> > between the index of pages on the same mapping.
>
> Nope. Large-number-of-small-files is a pretty common case. If the fs
> doesn't handle that well (ie: by placing them nearby on disk), it's
> borked.
Filesystems already handle this case just fine as we see it from
writeback all the time. Untarring a kernel is a good example of
this...
I suggested sorting all the IO to be issued into per-mapping page
groups because:
a) makes IO issued from reclaim look almost exactly the same
to the filesytem as if writeback is pushing out the IO.
b) it looks to be a trivial addition to the new code.
To me that's a no-brainer.
> It would be interesting to code up a little test patch though, see if
> there's benefit to be had going down this path.
I doubt Mel's tests cases will show anything - they simply didn't
show enough IO issued from reclaim to make any difference.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-06-15 6:37 UTC|newest]
Thread overview: 198+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-14 11:17 [PATCH 0/12] Avoid overflowing of stack during page reclaim V2 Mel Gorman
2010-06-14 11:17 ` Mel Gorman
2010-06-14 11:17 ` [PATCH 01/12] tracing, vmscan: Add trace events for kswapd wakeup, sleeping and direct reclaim Mel Gorman
2010-06-14 11:17 ` Mel Gorman
2010-06-14 15:45 ` Rik van Riel
2010-06-14 15:45 ` Rik van Riel
2010-06-14 21:01 ` Larry Woodman
2010-06-14 21:01 ` Larry Woodman
2010-06-14 11:17 ` [PATCH 02/12] tracing, vmscan: Add trace events for LRU page isolation Mel Gorman
2010-06-14 11:17 ` Mel Gorman
2010-06-14 16:47 ` Rik van Riel
2010-06-14 16:47 ` Rik van Riel
2010-06-14 21:02 ` Larry Woodman
2010-06-14 21:02 ` Larry Woodman
2010-06-14 11:17 ` [PATCH 03/12] tracing, vmscan: Add trace event when a page is written Mel Gorman
2010-06-14 11:17 ` Mel Gorman
2010-06-14 16:48 ` Rik van Riel
2010-06-14 16:48 ` Rik van Riel
2010-06-14 21:02 ` Larry Woodman
2010-06-14 21:02 ` Larry Woodman
2010-06-14 11:17 ` [PATCH 04/12] tracing, vmscan: Add a postprocessing script for reclaim-related ftrace events Mel Gorman
2010-06-14 11:17 ` Mel Gorman
2010-06-14 17:55 ` Rik van Riel
2010-06-14 17:55 ` Rik van Riel
2010-06-14 21:03 ` Larry Woodman
2010-06-14 21:03 ` Larry Woodman
2010-06-14 11:17 ` [PATCH 05/12] vmscan: kill prev_priority completely Mel Gorman
2010-06-14 11:17 ` Mel Gorman
2010-06-14 18:04 ` Rik van Riel
2010-06-14 18:04 ` Rik van Riel
2010-06-16 23:37 ` Andrew Morton
2010-06-16 23:37 ` Andrew Morton
2010-06-16 23:45 ` Rik van Riel
2010-06-16 23:45 ` Rik van Riel
2010-06-17 0:18 ` Andrew Morton
2010-06-17 0:18 ` Andrew Morton
2010-06-17 0:34 ` Rik van Riel
2010-06-17 0:34 ` Rik van Riel
2010-06-25 8:29 ` KOSAKI Motohiro
2010-06-25 8:29 ` KOSAKI Motohiro
2010-06-28 10:35 ` Mel Gorman
2010-06-28 10:35 ` Mel Gorman
2010-06-14 11:17 ` [PATCH 06/12] vmscan: simplify shrink_inactive_list() Mel Gorman
2010-06-14 11:17 ` Mel Gorman
2010-06-14 18:06 ` Rik van Riel
2010-06-14 18:06 ` Rik van Riel
2010-06-15 10:13 ` Mel Gorman
2010-06-15 10:13 ` Mel Gorman
2010-06-14 11:17 ` [PATCH 07/12] vmscan: Remove unnecessary temporary vars in do_try_to_free_pages Mel Gorman
2010-06-14 11:17 ` Mel Gorman
2010-06-14 18:14 ` Rik van Riel
2010-06-14 18:14 ` Rik van Riel
2010-06-14 11:17 ` [PATCH 08/12] vmscan: Setup pagevec as late as possible in shrink_inactive_list() Mel Gorman
2010-06-14 11:17 ` Mel Gorman
2010-06-14 18:59 ` Rik van Riel
2010-06-14 18:59 ` Rik van Riel
2010-06-15 10:47 ` Christoph Hellwig
2010-06-15 10:47 ` Christoph Hellwig
2010-06-15 15:56 ` Mel Gorman
2010-06-15 15:56 ` Mel Gorman
2010-06-16 23:43 ` Andrew Morton
2010-06-16 23:43 ` Andrew Morton
2010-06-17 10:30 ` Mel Gorman
2010-06-17 10:30 ` Mel Gorman
2010-06-14 11:17 ` [PATCH 09/12] vmscan: Setup pagevec as late as possible in shrink_page_list() Mel Gorman
2010-06-14 11:17 ` Mel Gorman
2010-06-14 19:24 ` Rik van Riel
2010-06-14 19:24 ` Rik van Riel
2010-06-16 23:48 ` Andrew Morton
2010-06-16 23:48 ` Andrew Morton
2010-06-17 10:46 ` Mel Gorman
2010-06-17 10:46 ` Mel Gorman
2010-06-14 11:17 ` [PATCH 10/12] vmscan: Update isolated page counters outside of main path in shrink_inactive_list() Mel Gorman
2010-06-14 11:17 ` Mel Gorman
2010-06-14 19:42 ` Rik van Riel
2010-06-14 19:42 ` Rik van Riel
2010-06-14 11:17 ` [PATCH 11/12] vmscan: Write out dirty pages in batch Mel Gorman
2010-06-14 11:17 ` Mel Gorman
2010-06-14 21:13 ` Rik van Riel
2010-06-14 21:13 ` Rik van Riel
2010-06-15 10:18 ` Mel Gorman
2010-06-15 10:18 ` Mel Gorman
2010-06-14 23:11 ` Dave Chinner
2010-06-14 23:11 ` Dave Chinner
2010-06-14 23:21 ` Andrew Morton
2010-06-14 23:21 ` Andrew Morton
2010-06-15 0:39 ` Dave Chinner
2010-06-15 0:39 ` Dave Chinner
2010-06-15 1:16 ` Rik van Riel
2010-06-15 1:16 ` Rik van Riel
2010-06-15 1:45 ` Andrew Morton
2010-06-15 1:45 ` Andrew Morton
2010-06-15 4:08 ` Rik van Riel
2010-06-15 4:08 ` Rik van Riel
2010-06-15 4:37 ` Andrew Morton
2010-06-15 4:37 ` Andrew Morton
2010-06-15 5:12 ` Nick Piggin
2010-06-15 5:12 ` Nick Piggin
2010-06-15 5:43 ` [patch] mm: vmscan fix mapping use after free Nick Piggin
2010-06-15 5:43 ` Nick Piggin
2010-06-15 13:23 ` Mel Gorman
2010-06-15 13:23 ` Mel Gorman
2010-06-15 11:01 ` [PATCH 11/12] vmscan: Write out dirty pages in batch Christoph Hellwig
2010-06-15 11:01 ` Christoph Hellwig
2010-06-15 13:32 ` Rik van Riel
2010-06-15 13:32 ` Rik van Riel
2010-06-15 1:39 ` Andrew Morton
2010-06-15 1:39 ` Andrew Morton
2010-06-15 3:20 ` Dave Chinner
2010-06-15 3:20 ` Dave Chinner
2010-06-15 4:15 ` Andrew Morton
2010-06-15 4:15 ` Andrew Morton
2010-06-15 6:36 ` Dave Chinner [this message]
2010-06-15 6:36 ` Dave Chinner
2010-06-15 10:28 ` Evgeniy Polyakov
2010-06-15 10:28 ` Evgeniy Polyakov
2010-06-15 10:55 ` Nick Piggin
2010-06-15 10:55 ` Nick Piggin
2010-06-15 11:10 ` Christoph Hellwig
2010-06-15 11:10 ` Christoph Hellwig
2010-06-15 11:20 ` Nick Piggin
2010-06-15 11:20 ` Nick Piggin
2010-06-15 23:20 ` Dave Chinner
2010-06-15 23:20 ` Dave Chinner
2010-06-16 6:04 ` Nick Piggin
2010-06-16 6:04 ` Nick Piggin
2010-06-15 11:08 ` Christoph Hellwig
2010-06-15 11:08 ` Christoph Hellwig
2010-06-15 11:43 ` Mel Gorman
2010-06-15 11:43 ` Mel Gorman
2010-06-15 13:07 ` tytso
2010-06-15 13:07 ` tytso
2010-06-15 15:44 ` Mel Gorman
2010-06-15 15:44 ` Mel Gorman
2010-06-15 10:57 ` Christoph Hellwig
2010-06-15 10:57 ` Christoph Hellwig
2010-06-15 10:53 ` Christoph Hellwig
2010-06-15 10:53 ` Christoph Hellwig
2010-06-15 11:11 ` Mel Gorman
2010-06-15 11:11 ` Mel Gorman
2010-06-15 11:13 ` Nick Piggin
2010-06-15 11:13 ` Nick Piggin
2010-06-14 11:17 ` [PATCH 12/12] vmscan: Do not writeback pages in direct reclaim Mel Gorman
2010-06-14 11:17 ` Mel Gorman
2010-06-14 21:55 ` Rik van Riel
2010-06-14 21:55 ` Rik van Riel
2010-06-15 11:45 ` Mel Gorman
2010-06-15 11:45 ` Mel Gorman
2010-06-15 13:34 ` Rik van Riel
2010-06-15 13:34 ` Rik van Riel
2010-06-15 13:37 ` Christoph Hellwig
2010-06-15 13:37 ` Christoph Hellwig
2010-06-15 13:54 ` Mel Gorman
2010-06-15 13:54 ` Mel Gorman
2010-06-16 0:30 ` KAMEZAWA Hiroyuki
2010-06-16 0:30 ` KAMEZAWA Hiroyuki
2010-06-15 14:02 ` Rik van Riel
2010-06-15 14:02 ` Rik van Riel
2010-06-15 13:59 ` Mel Gorman
2010-06-15 13:59 ` Mel Gorman
2010-06-15 14:04 ` Rik van Riel
2010-06-15 14:04 ` Rik van Riel
2010-06-15 14:16 ` Mel Gorman
2010-06-15 14:16 ` Mel Gorman
2010-06-16 0:17 ` KAMEZAWA Hiroyuki
2010-06-16 0:17 ` KAMEZAWA Hiroyuki
2010-06-16 0:29 ` Rik van Riel
2010-06-16 0:29 ` Rik van Riel
2010-06-16 0:39 ` KAMEZAWA Hiroyuki
2010-06-16 0:39 ` KAMEZAWA Hiroyuki
2010-06-16 0:53 ` Rik van Riel
2010-06-16 0:53 ` Rik van Riel
2010-06-16 1:40 ` KAMEZAWA Hiroyuki
2010-06-16 1:40 ` KAMEZAWA Hiroyuki
2010-06-16 2:20 ` KAMEZAWA Hiroyuki
2010-06-16 2:20 ` KAMEZAWA Hiroyuki
2010-06-16 5:11 ` Christoph Hellwig
2010-06-16 5:11 ` Christoph Hellwig
2010-06-16 10:51 ` Jens Axboe
2010-06-16 10:51 ` Jens Axboe
2010-06-16 5:07 ` Christoph Hellwig
2010-06-16 5:07 ` Christoph Hellwig
2010-06-16 5:06 ` Christoph Hellwig
2010-06-16 5:06 ` Christoph Hellwig
2010-06-17 0:25 ` KAMEZAWA Hiroyuki
2010-06-17 0:25 ` KAMEZAWA Hiroyuki
2010-06-17 6:16 ` Christoph Hellwig
2010-06-17 6:16 ` Christoph Hellwig
2010-06-17 6:23 ` KAMEZAWA Hiroyuki
2010-06-17 6:23 ` KAMEZAWA Hiroyuki
2010-06-14 15:10 ` [PATCH 0/12] Avoid overflowing of stack during page reclaim V2 Christoph Hellwig
2010-06-14 15:10 ` Christoph Hellwig
2010-06-15 11:45 ` Mel Gorman
2010-06-15 11:45 ` Mel Gorman
2010-06-15 0:08 ` KAMEZAWA Hiroyuki
2010-06-15 0:08 ` KAMEZAWA Hiroyuki
2010-06-15 11:49 ` Mel Gorman
2010-06-15 11:49 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100615063643.GS6590@dastard \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.