linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@redhat.com>
To: ext4 development <linux-ext4@vger.kernel.org>
Subject: xfs preallocation writeup, for comparison
Date: Tue, 28 Nov 2006 15:04:18 -0600	[thread overview]
Message-ID: <456CA452.2040302@redhat.com> (raw)

As promised, here is a writeup of xfs preallocation routines.

I don't hold these up as the perfect or best way to do this task, but it
is worth looking at what has been done before, to get ideas, find better
ways, and avoid pitfalls for ext4.

XFS preallocation interfaces.
=============================

The xfs preallocation interfaces are described in the xfsctl(3) manpage.
It's not the best doc, so I'll summarize:

XFS has these ioctl calls for space managment of files:

       XFS_IOC_ALLOCSP
       XFS_IOC_FREESP
       XFS_IOC_RESVSP
       XFS_IOC_UNRESVSP

All of these interfaces take an flock-style argument, and you use it to
specify the range of bytes in the file which should be preallocated,
essentially with an offset and a length.

The real work for all of this is done in xfs_change_file_space() in
xfs_vnodeops.c

The main difference between resvsp and allocsp is that resvsp marks the
blocks as "unwritten" meaning that they are allocated but not yet
written to, and if they are read, they will return zeros.  allocsp
actually writes zeros into the allocated blocks.  We can use the xfs_io
tool to demonstrate.

resvsp example:
==============

[root@magnesium test]# touch resvsp
[root@magnesium test]# xfs_io resvsp
xfs_io> resvsp 0 10g
xfs_io> bmap -vp
resvsp:
 EXT: FILE-OFFSET           BLOCK-RANGE        AG AG-OFFSET
TOTAL FLAGS
   0: [0..16657327]:        16657456..33314783  1 (64..16657391)
16657328 10000
   1: [16657328..20971519]: 96..4314287         0 (96..4314287)
4314192 10000

so we got 2 extents for this 10g file - those are actual filesystem
blocks allocated.  The file is 0 length, but is using 10g of blocks:

[root@magnesium test]# ls -lh resvsp
-rw-r--r--  1 root root 0 Nov 28 14:11 resvsp
[root@magnesium test]# du -hc resvsp
10G     resvsp
10G     total

The extents are simply flagged as unwritten (0x10000 above), so very
little IO occurs and the space reservation is fast..

allocsp example:
===============
(note there's a bit of a buglet in xfs_io, hence the swapped arguments)

[root@magnesium test]# touch allocsp
[root@magnesium test]# xfs_io allocsp
xfs_io> allocsp 10g 0
<wait for IO...>
xfs_io> bmap -vp
allocsp:
 EXT: FILE-OFFSET           BLOCK-RANGE        AG AG-OFFSET
TOTAL
   0: [0..16657327]:        33314848..49972175  2 (64..16657391)
16657328
   1: [16657328..20971519]: 4314288..8628479    0 (4314288..8628479)
4314192

We also got 2 extents here, but they are not flagged as unwritten -
those filesystem blocks were all actually filled with zeros.

[root@magnesium test]# ls -lh allocsp
-rw-r--r--  1 root root 10G Nov 28 14:19 allocsp
[root@magnesium test]# du -hc allocsp
10G     allocsp
10G     total

It would be very nice to see posix_fallocate hooked up to the underlying
filesystem, so that it can make smart decisions about how to efficiently
reserve space...

-Eric

                 reply	other threads:[~2006-11-28 21:04 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=456CA452.2040302@redhat.com \
    --to=sandeen@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).