From: Andrew Morton <akpm@zip.com.au>
To: Jeremy Jackson <jerj@coplanar.net>
Cc: lkml <linux-kernel@vger.kernel.org>
Subject: Re: [prepatch] address_space-based writeback
Date: Wed, 10 Apr 2002 12:41:44 -0700 [thread overview]
Message-ID: <3CB49578.34C34438@zip.com.au> (raw)
In-Reply-To: <3CB4203D.C3BE7298@zip.com.au> <004b01c1e0c6$01d690f0$7e0aa8c0@bridge>
Jeremy Jackson wrote:
>
> This sounds like a wonderful piece of work.
> I'm also inspired by the rmap stuff coming down
> the pipe. I wonder if there would be any interference
> between the two, or could they leverage each other?
>
well the theory is that rmap doesn't need to know. It just
calls writepage when it sees dirty pages. That's a minimal
approach. But I'd like the VM to aggressively use the
"write out lots of pages in the same call" APIs, rather
than the current "send lots of individual pages" approach.
For a number of reasons:
- For delalloc filesystems, and for sparse mappings
against "syncalloc" filesystems, disk allocation is
performed at ->writepage time. It's important that
the writes be clustered effectively. Otherwise
the file gets fragmented on-disk.
- There's a reasonable chance that the pages on the
LRU lists get themselves out-of-order as the aging
process proceeds. So calling ->writepage in lru_list
order has the potential to result in fragmented write
patterns, and inefficient I/O.
- If the VM is trying to free pages from, say, ZONE_NORMAL
then it will only perform writeout against pages from
ZONE_NORMAL and ZONE_DMA. But there may be writable pages
from ZONE_HIGHMEM sprinkled amongst those pages within the
file. It would be really bad if we miss out on opportunistically
slotting those other pages into the same disk request.
- The current VM writeback is an enormous code path. For each
tiny little page, we send it off into writepage, pass it
to the filesystem, give it a disk mapping, give it a buffer_head,
submit the buffer_head, attach a tiny BIO to the buffer_head,
submit the BIO, merge that onto a request structure which
contains contiguous BIOs, feed that list-of-single-page-BIOs
to the device driver.
That's rather painful. So the intent is to batch this work
up. Instead, the VM says "write up to four megs from this
page's mapping, including this page".
That request passes through the filesystem and we wrap 64k
or larger BIOs around the pagecache data and put those into
the request queue. For some filesystems the buffer_heads
are ignored altogether. For others, the buffer_head
can be used at "build the big BIO" time to work out how to
segment the BIOs across sub-page boundaries.
The "including this page" requirement of the vm_writeback
method is there because the VM may be trying to free pages
from a specific zone, so it would be not useful if the
filesystem went and submitted I/O for a ton of pages which
are all from the wrong zone. This is a bit of an ugly
back-compatible placeholder to keep the VM happy before
we move on to that part of it.
-
next prev parent reply other threads:[~2002-04-10 20:43 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-04-10 11:21 [prepatch] address_space-based writeback Andrew Morton
2002-04-10 11:34 ` Alexander Viro
2002-04-10 19:16 ` Andrew Morton
2002-04-10 20:53 ` Alexander Viro
2002-04-10 22:12 ` Jan Harkes
2002-04-10 21:44 ` Andrew Morton
2002-04-10 22:56 ` Anton Altaparmakov
2002-04-10 22:31 ` Andrew Morton
2002-04-11 20:20 ` Linus Torvalds
2002-04-11 20:41 ` Alexander Viro
2002-04-11 21:27 ` Andrew Morton
2002-04-11 22:55 ` Andreas Dilger
2002-04-11 22:49 ` Andrew Morton
2002-04-12 0:12 ` Linus Torvalds
2002-04-11 23:10 ` Christoph Hellwig
2002-04-11 23:22 ` Anton Altaparmakov
2002-04-11 23:03 ` Andrew Morton
2002-04-12 4:19 ` Bill Davidsen
2002-04-12 1:15 ` Anton Altaparmakov
2002-04-12 1:37 ` Linus Torvalds
2002-04-12 7:57 ` Anton Altaparmakov
2002-04-27 15:53 ` Jan Harkes
2002-04-28 3:03 ` Anton Altaparmakov
2002-04-29 9:03 ` Nikita Danilov
2002-04-29 11:11 ` Anton Altaparmakov
2002-04-29 11:59 ` Nikita Danilov
2002-04-29 12:34 ` Anton Altaparmakov
2002-04-29 13:01 ` Christoph Hellwig
2002-04-30 17:19 ` Denis Vlasenko
2002-04-30 13:15 ` john slee
2002-04-30 13:24 ` Billy O'Connor
2002-04-30 13:36 ` jlnance
2002-04-30 13:40 ` Keith Owens
2002-05-01 19:18 ` Denis Vlasenko
2002-05-02 8:49 ` Anton Altaparmakov
2002-05-03 15:35 ` Denis Vlasenko
2002-05-03 12:49 ` Helge Hafting
2002-05-03 22:47 ` Denis Vlasenko
2002-05-03 21:50 ` Anton Altaparmakov
2002-05-05 0:46 ` Denis Vlasenko
2002-05-03 7:56 ` Pavel Machek
2002-05-03 14:48 ` Rob Landley
2002-05-05 0:42 ` Denis Vlasenko
2002-04-30 16:12 ` Peter Wächtler
2002-04-10 23:02 ` Jan Harkes
2002-04-10 19:29 ` Jeremy Jackson
2002-04-10 19:41 ` Andrew Morton [this message]
2002-04-15 8:47 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3CB49578.34C34438@zip.com.au \
--to=akpm@zip.com.au \
--cc=jerj@coplanar.net \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox