public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andy Isaacson <adi@hexapodia.org>
To: Hua Zhong <hzhong@cisco.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Is there a "make hole" (truncate in middle) syscall?
Date: Thu, 11 Dec 2003 12:58:06 -0600	[thread overview]
Message-ID: <20031211125806.B2422@hexapodia.org> (raw)
In-Reply-To: <011e01c3bfa5$8fb5a0e0$d43147ab@amer.cisco.com>; from hzhong@cisco.com on Wed, Dec 10, 2003 at 09:13:49PM -0800

On Wed, Dec 10, 2003 at 09:13:49PM -0800, Hua Zhong wrote:
> This would be a tremendous enhancement to Linux filesystems, and one of
> my current projects actually needs this capability badly.
> 
> The project is a lightweight user-space library which implements a
> file-based database. Each database has several files. The files are all
> block-based, and each block is always a multiple of 512 byte (and we
> could make it a multiple of 4K, in case this feature existed).
> 
> Blocks are organized as a B+ tree, so we have a root block, which points
> to its child blocks, and in turn they point to the next level. There is
> a free block list too.
> 
> The problem is with a lot of add/delete, there are a lot of free blocks
> inside the file. So essentially we'd have to manually shrink these files
> when it grows too big and eats up too much space. If we could just "dig
> a hole", it would be trivial to return those blocks to the filesystem
> without doing an expensive defragmentation.

The abstract interface for make_hole() is simple, but it turns into a
pretty expensive filesystem operation, I think.  After many cycles of
free/allocate, your file would be badly fragmented across the
filesystem.  You'll probably get better overall performance by keeping
track of how "sparse" your file is (you could compare st_blocks versus
how many blocks you have allocated in your tree structure) and re-write
it when you're wasting more than, say, 20% of the allocated space.

It turns into an interesting problem if you don't want to double your
space requirements during the re-write process.  You could write the
new file "backwards", one MB at a time, truncating the previous file at
each step to free up the blocks.  You'd end up with contiguous 1MB
chunks, which given your tree organization is probably good enough.  If
you wanted really good streaming performance you'd want to do bigger
chunks (or just write the file from the beginning, or use the
pre-allocation APIs that I think XFS provides).

-andy

  parent reply	other threads:[~2003-12-11 18:58 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-12-04 20:32 Is there a "make hole" (truncate in middle) syscall? Rob Landley
2003-12-04 20:55 ` Måns Rullgård
2003-12-04 21:10 ` Szakacsits Szabolcs
2003-12-05  0:02   ` Rob Landley
2003-12-04 22:33     ` Szakacsits Szabolcs
2003-12-05 11:22     ` Helge Hafting
2003-12-05 12:11   ` Måns Rullgård
2003-12-05 22:41     ` Mike Fedyk
2003-12-05 23:25       ` Måns Rullgård
2003-12-05 23:33       ` Szakacsits Szabolcs
2003-12-05 23:25     ` Szakacsits Szabolcs
2003-12-04 21:48 ` Mike Fedyk
2003-12-04 23:59   ` Rob Landley
2003-12-05 22:42     ` Olaf Titz
2003-12-04 22:53 ` Peter Chubb
2003-12-05  1:04   ` Philippe Troin
2003-12-05  2:39     ` Peter Chubb
2003-12-08  4:03     ` bill davidsen
2003-12-04 23:23 ` Andy Isaacson
2003-12-04 23:42   ` Szakacsits Szabolcs
2003-12-05  2:03     ` Mike Fedyk
2003-12-05  7:09       ` Ville Herva
2003-12-05 11:22   ` Anton Altaparmakov
2003-12-05 11:44     ` viro
2003-12-05 14:27       ` Anton Altaparmakov
2003-12-05 21:00   ` sparse file performance (was Re: Is there a "make hole" (truncate in middle) syscall?) Andy Isaacson
2003-12-05 21:12     ` Linus Torvalds
2003-12-08 20:43       ` Andy Isaacson
2003-12-11  5:13 ` Is there a "make hole" (truncate in middle) syscall? Hua Zhong
2003-12-11  6:19   ` Rob Landley
2003-12-11 18:58   ` Andy Isaacson [this message]
2003-12-11 19:15     ` Hua Zhong
2003-12-11 19:43       ` Andreas Dilger
2003-12-12 21:37         ` Daniel Phillips
2003-12-11 19:48       ` Jörn Engel
2003-12-11 19:55         ` Hua Zhong
2003-12-11 19:58         ` Andy Isaacson
2003-12-12 12:18           ` Jörn Engel
2003-12-12 15:40             ` Andy Isaacson
2003-12-12 16:03               ` Jörn Engel
2003-12-11 20:32         ` Rob Landley
2003-12-12 12:55           ` Jörn Engel
2003-12-12 13:28             ` Vladimir Saveliev
2003-12-12 13:43               ` Jörn Engel
2003-12-12 13:52                 ` Vladimir Saveliev
2003-12-12 14:04                   ` Jörn Engel
2003-12-12 13:53               ` Rob Landley
2003-12-12 14:01                 ` Vladimir Saveliev
2003-12-12 21:35                   ` Rob Landley
2003-12-15 10:00                     ` Vladimir Saveliev
2003-12-15 11:52                       ` Rob Landley
2003-12-15 13:26                         ` Jörn Engel
2003-12-12 13:39             ` Rob Landley
2003-12-12 13:56               ` Jörn Engel
2003-12-12 14:24                 ` Jörn Engel
2003-12-12 21:37                   ` Rob Landley
2003-12-15 12:47                     ` Jörn Engel
2003-12-16  5:43                       ` Rob Landley
2003-12-16 11:05                         ` Jörn Engel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031211125806.B2422@hexapodia.org \
    --to=adi@hexapodia.org \
    --cc=hzhong@cisco.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox