linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: linux-kernel@vger.kernel.org, chris.mason@oracle.com,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH][RFC] fast file mapping for loop
Date: Thu, 10 Jan 2008 20:25:12 -0500	[thread overview]
Message-ID: <4786C578.3090302@tmr.com> (raw)
In-Reply-To: <20080109085231.GE6650@kernel.dk>

Jens Axboe wrote:
> Hi,
> 
> loop.c currently uses the page cache interface to do IO to file backed
> devices. This works reasonably well for simple things, like mapping an
> iso9660 file for direct mount and other read-only workloads. Writing is
> somewhat problematic, as anyone who has really used this feature can
> attest to - it tends to confuse the vm (hello kswapd) since it break
> dirty accounting and behaves very erratically on writeout. Did I mention
> that it's pretty slow as well, for both reads and writes?
> 
Since you are looking for comments, I'll mention a loop-related behavior 
I've been seeing and see if it gets comments or is useful, since it can 
be used to tickle bad behavior on demand.

I have an 6GB sparse file, which I mount with cryptoloop and populate as 
an ext3 filesystem (more later on why). I then copy ~5.8GB of data to 
the filesystem, which is unmounted to be burnt to a DVD. Before it's 
burned the "dvdisaster" application is used to add some ECC information 
to the end, and make an image which fits on a DVD-DL. Media will be 
burned and distributed to multiple locations.

The problem:

When copying with rsync, the copy runs at ~25MB/s for a while, then 
falls into a pattern of bursts of 25MB/s followed by 10-15 sec of iowait 
with no disk activity. So I tried doing the copy by cpio
   find . -depth | cpio -pdm /mnt/loop
which shows exactly the same behavior. Then, for no good reason I tried
   find . -depth | cpio -pBdm /mnt/loop
and the copy ran at 25MB/s for the whole data set.

I was able to see similar results with a pure loop mount, I only mention 
the crypto for accuracy. Because many of these have been shipped over 
the last two years and new loop code would only be useful in this case 
if it were compatible so old data sets could be read.

> It also behaves differently than a real drive. For writes, completions
> are done once they hit page cache. Since loop queues bio's async and
> hands them off to a thread, you can have a huge backlog of stuff to do.
> It's hard to attempt to guarentee data safety for file systems on top of
> loop without making it even slower than it currently is.
> 
> Back when loop was only used for iso9660 mounting and other simple
> things, this didn't matter. Now it's often used in xen (and others)
> setups where we do care about performance AND writing. So the below is a
> attempt at speeding up loop and making it behave like a real device.
> It's a somewhat quick hack and is still missing one piece to be
> complete, but I'll throw it out there for people to play with and
> comment on.
> 
> So how does it work? Instead of punting IO to a thread and passing it
> through the page cache, we instead attempt to send the IO directly to the
> filesystem block that it maps to. loop maintains a prio tree of known
> extents in the file (populated lazily on demand, as needed). Advantages
> of this approach:
> 
> - It's fast, loop will basically work at device speed.
> - It's fast, loop it doesn't put a huge amount of system load on the
>   system when busy. When I did comparison tests on my notebook with an
>   external drive, running a simple tiobench on the current in-kernel
>   loop with a sparse file backing rendered the notebook basically
>   unusable while the test was ongoing. The remapper version had no more
>   impact than it did when used directly on the external drive.
> - It behaves like a real block device.
> - It's easy to support IO barriers, which is needed to ensure safety
>   especially in virtualized setups.
> 
> Disadvantages:
> 
> - The file block mappings must not change while loop is using the file.
>   This means that we have to ensure exclusive access to the file and
>   this is the bit that is currently missing in the implementation. It
>   would be nice if we could just do this via open(), ideas welcome...
> - It'll tie down a bit of memory for the prio tree. This is GREATLY
>   offset by the reduced page cache foot print though.
> - It cannot be used with the loop encryption stuff. dm-crypt should be
>   used instead, on top of loop (which, I think, is even the recommended
>   way to do this today, so not a big deal).
> 

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

  parent reply	other threads:[~2008-01-11  1:01 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-09  8:52 [PATCH][RFC] fast file mapping for loop Jens Axboe
2008-01-09  9:31 ` Christoph Hellwig
2008-01-09  9:43   ` Jens Axboe
2008-01-09 11:00     ` Chris Mason
2008-01-09 15:34 ` Andi Kleen
2008-01-10  8:43   ` Jens Axboe
2008-01-09 23:16 ` Alasdair G Kergon
2008-01-10  8:31   ` Jens Axboe
2008-01-10  8:42     ` Jens Axboe
2008-01-11  7:39       ` Mikulas Patocka
2008-01-11  7:58         ` Jens Axboe
2008-01-10 12:47     ` Chris Mason
2008-01-10 12:57       ` Jens Axboe
2008-01-10 23:01         ` Neil Brown
2008-01-11 14:21           ` Chris Mason
2008-01-10  1:42 ` Nick Piggin
2008-01-10  8:34   ` Jens Axboe
2008-01-10  8:37   ` Christoph Hellwig
2008-01-10  8:44     ` Jens Axboe
2008-01-10  8:54       ` Christoph Hellwig
2008-01-10  9:01         ` Jens Axboe
2008-01-10 12:53         ` Chris Mason
2008-01-10 13:03           ` Jens Axboe
2008-01-10 13:46             ` Chris Mason
2008-01-10  9:37     ` Peter Zijlstra
2008-01-10  9:49       ` Jens Axboe
2008-01-10  9:52         ` Peter Zijlstra
2008-01-10 10:02           ` Jens Axboe
2008-01-10 10:20             ` Peter Zijlstra
2008-01-11  1:25 ` Bill Davidsen [this message]
2008-01-11 18:17 ` Daniel Phillips
2008-01-11 18:23   ` Jens Axboe
2008-01-14 17:10 ` Chris Mason
2008-01-14 17:54   ` Jens Axboe
2008-01-15  9:25     ` Jens Axboe
2008-01-15  9:36       ` Jens Axboe
2008-01-15 10:07         ` Jens Axboe
2008-01-15 14:04           ` Chris Mason
     [not found] <95637914@web.de>
2008-01-09 23:53 ` Alasdair G Kergon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4786C578.3090302@tmr.com \
    --to=davidsen@tmr.com \
    --cc=chris.mason@oracle.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).