From: Bill Davidsen <davidsen@tmr.com>
To: Thomas Glanzmann <thomas@glanzmann.de>,
tytso@thunk.org, LKML <linux-kernel@vger.kernel.org>,
linux-ext4@vger.kernel.org
Subject: Re: zero out blocks of freed user data for operation a virtual machine environment
Date: Mon, 25 May 2009 17:19:06 -0400 [thread overview]
Message-ID: <4A1B0B4A.8050706@tmr.com> (raw)
In-Reply-To: <20090524170045.GC24753@cip.informatik.uni-erlangen.de>
Thomas Glanzmann wrote:
> Hello Ted,
> I would like to know if there is already a mount option or feature in
> ext3/ext4 that automatically overwrites freed blocks with zeros? If this
> is not the case I would like to know if you would consider a patch for
> upstream? I'm asking this because I currently do some research work on
> data deduplication in virtual machine environments and corresponding
> backups. It would be a huge space saver if there is such a feature
> because todays and tomorrows backup tools for virtual machine
> environments work on the block layer (VMware Consolidated Backup, VMware
> Data Recovery, and NetApp Snapshots). This is not only true for backup
> tools but also for running Virtual machines. The case that this future
> addresses is the following: A huge file is downloaded and later delted.
> The backup and datadeduplication that is operating on the block level
> can't identify the block as unused. This results in backing up the
> amount of the data that was previously allocated by the file and as such
> introduces an performance overhead. If you're interested in real live
> data, I'm able to provide them.
>
> If you don't intend to have such an optional feature in ext3/ext4 I
> would like to know if you know a tool that makes it possible to zero out
> unused blocks?
>
Treating blocks as unused due to content seems a bad idea, if you want them to
be unused look for references to TRIM, if you want this for security look at
shred. And if you are interested in backing sparse files I believe that the tar
"-S" option will do what you want or provide code you can use to start writing
what you want.
I don't think this is a good solution to the problem that unused space is not
accounted as you wish it would be. Most filesystems have a bitmap to track this
already, a handle on that would be more generally useful.
Deleting files is slow enough, identifying unused storage by content is 1950s
thinking, and also ignores the fact that new drives often don't come zeroed, and
would behave badly unless you manually zeroed the unused portions.
I doubt this is the optimal solution, since you would have to read the zeros to
see if they were present, making backup smaller but no faster than just doing a
copy.
--
Bill Davidsen <davidsen@tmr.com>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot
next prev parent reply other threads:[~2009-05-25 21:19 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-24 17:00 zero out blocks of freed user data for operation a virtual machine environment Thomas Glanzmann
2009-05-24 17:15 ` Arjan van de Ven
2009-05-24 17:39 ` Thomas Glanzmann
2009-05-25 12:03 ` Theodore Tso
2009-05-25 12:34 ` Thomas Glanzmann
2009-05-25 13:14 ` Goswin von Brederlow
2009-05-25 14:01 ` Thomas Glanzmann
[not found] ` <f3177b9e0905251023n762b815akace1ae34e643458e@mail.gmail.com>
2009-05-25 17:26 ` Chris Worley
2009-05-26 10:22 ` Goswin von Brederlow
2009-05-26 16:52 ` Chris Worley
2009-05-28 19:27 ` Goswin von Brederlow
2009-05-25 3:29 ` David Newall
2009-05-25 5:26 ` Thomas Glanzmann
2009-05-25 7:48 ` Ron Yorston
2009-05-25 10:50 ` Thomas Glanzmann
2009-05-25 12:06 ` Theodore Tso
2009-05-25 21:19 ` Bill Davidsen [this message]
2009-05-26 4:45 ` Thomas Glanzmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A1B0B4A.8050706@tmr.com \
--to=davidsen@tmr.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=thomas@glanzmann.de \
--cc=tytso@thunk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox