public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@zip.com.au>
To: Jeremy Jackson <jerj@coplanar.net>
Cc: lkml <linux-kernel@vger.kernel.org>
Subject: Re: [prepatch] address_space-based writeback
Date: Wed, 10 Apr 2002 12:41:44 -0700	[thread overview]
Message-ID: <3CB49578.34C34438@zip.com.au> (raw)
In-Reply-To: <3CB4203D.C3BE7298@zip.com.au> <004b01c1e0c6$01d690f0$7e0aa8c0@bridge>

Jeremy Jackson wrote:
> 
> This sounds like a wonderful piece of work.
> I'm also inspired by the rmap stuff coming down
> the pipe.   I wonder if there would be any interference
> between the two, or could they leverage each other?
> 

well the theory is that rmap doesn't need to know. It just
calls writepage when it sees dirty pages. That's a minimal
approach.  But I'd like the VM to aggressively use the
"write out lots of pages in the same call" APIs, rather
than the current "send lots of individual pages" approach.

For a number of reasons:

- For delalloc filesystems, and for sparse mappings
  against "syncalloc" filesystems, disk allocation is
  performed at ->writepage time.  It's important that
  the writes be clustered effectively.  Otherwise
  the file gets fragmented on-disk.

- There's a reasonable chance that the pages on the
  LRU lists get themselves out-of-order as the aging
  process proceeds.  So calling ->writepage in lru_list
  order has the potential to result in fragmented write
  patterns, and inefficient I/O.

- If the VM is trying to free pages from, say, ZONE_NORMAL
  then it will only perform writeout against pages from
  ZONE_NORMAL and ZONE_DMA.  But there may be writable pages
  from ZONE_HIGHMEM sprinkled amongst those pages within the
  file. It would be really bad if we miss out on opportunistically
  slotting those other pages into the same disk request.

- The current VM writeback is an enormous code path.  For each
  tiny little page, we send it off into writepage, pass it
  to the filesystem, give it a disk mapping, give it a buffer_head,
  submit the buffer_head, attach a tiny BIO to the buffer_head,
  submit the BIO, merge that onto a request structure which
  contains contiguous BIOs, feed that list-of-single-page-BIOs
  to the device driver.

  That's rather painful.   So the intent is to batch this work
  up.  Instead, the VM says "write up to four megs from this
  page's mapping, including this page".

  That request passes through the filesystem and we wrap 64k
  or larger BIOs around the pagecache data and put those into
  the request queue.  For some filesystems the buffer_heads
  are ignored altogether.  For others, the buffer_head
  can be used at "build the big BIO" time to work out how to
  segment the BIOs across sub-page boundaries.

  The "including this page" requirement of the vm_writeback
  method is there because the VM may be trying to free pages
  from a specific zone, so it would be not useful if the
  filesystem went and submitted I/O for a ton of pages which
  are all from the wrong zone.  This is a bit of an ugly
  back-compatible placeholder to keep the VM happy before
  we move on to that part of it.

-

  reply	other threads:[~2002-04-10 20:43 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-04-10 11:21 [prepatch] address_space-based writeback Andrew Morton
2002-04-10 11:34 ` Alexander Viro
2002-04-10 19:16   ` Andrew Morton
2002-04-10 20:53     ` Alexander Viro
2002-04-10 22:12     ` Jan Harkes
2002-04-10 21:44       ` Andrew Morton
2002-04-10 22:56         ` Anton Altaparmakov
2002-04-10 22:31           ` Andrew Morton
2002-04-11 20:20           ` Linus Torvalds
2002-04-11 20:41             ` Alexander Viro
2002-04-11 21:27               ` Andrew Morton
2002-04-11 22:55                 ` Andreas Dilger
2002-04-11 22:49                   ` Andrew Morton
2002-04-12  0:12                     ` Linus Torvalds
2002-04-11 23:10                   ` Christoph Hellwig
2002-04-11 23:22                 ` Anton Altaparmakov
2002-04-11 23:03                   ` Andrew Morton
2002-04-12  4:19                   ` Bill Davidsen
2002-04-12  1:15             ` Anton Altaparmakov
2002-04-12  1:37               ` Linus Torvalds
2002-04-12  7:57                 ` Anton Altaparmakov
2002-04-27 15:53                   ` Jan Harkes
2002-04-28  3:03                     ` Anton Altaparmakov
2002-04-29  9:03                       ` Nikita Danilov
2002-04-29 11:11                         ` Anton Altaparmakov
2002-04-29 11:59                           ` Nikita Danilov
2002-04-29 12:34                             ` Anton Altaparmakov
2002-04-29 13:01                               ` Christoph Hellwig
2002-04-30 17:19                             ` Denis Vlasenko
2002-04-30 13:15                               ` john slee
2002-04-30 13:24                                 ` Billy O'Connor
2002-04-30 13:36                                   ` jlnance
2002-04-30 13:40                                 ` Keith Owens
2002-05-01 19:18                                   ` Denis Vlasenko
2002-05-02  8:49                                     ` Anton Altaparmakov
2002-05-03 15:35                                       ` Denis Vlasenko
2002-05-03 12:49                                         ` Helge Hafting
2002-05-03 22:47                                           ` Denis Vlasenko
2002-05-03 21:50                                             ` Anton Altaparmakov
2002-05-05  0:46                                               ` Denis Vlasenko
2002-05-03  7:56                                     ` Pavel Machek
2002-05-03 14:48                                     ` Rob Landley
2002-05-05  0:42                                       ` Denis Vlasenko
2002-04-30 16:12                                 ` Peter Wächtler
2002-04-10 23:02         ` Jan Harkes
2002-04-10 19:29 ` Jeremy Jackson
2002-04-10 19:41   ` Andrew Morton [this message]
2002-04-15  8:47 ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3CB49578.34C34438@zip.com.au \
    --to=akpm@zip.com.au \
    --cc=jerj@coplanar.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox