linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Brown <david@westcontrol.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: Content based storage
Date: Wed, 17 Mar 2010 09:21:50 +0100	[thread overview]
Message-ID: <hnq3fa$700$1@dough.gmane.org> (raw)
In-Reply-To: <4BA00A06.9020208@ilbello.com>

On 16/03/2010 23:45, Fabio wrote:
> Some years ago I was searching for that kind of functionality and found
> an experimental ext3 patch to allow the so-called COW-links:
> http://lwn.net/Articles/76616/
>

I'd read about the COW patches for ext3 before.  While there is 
certainly some similarity here, there are a fair number of differences. 
  One is that those patches were aimed only at copying - there was no 
way to merge files later.  Another is that it was (as far as I can see) 
just an experimental hack to try out the concept.  Since it didn't take 
off, I think it is worth learning from, but not building on.

> There was a discussion later on LWN http://lwn.net/Articles/77972/
> an approach like COW-links would break POSIX standards.
>

I think a lot of the problems here were concerning inode numbers.  As 
far as I understand it, when you made an ext3-cow copy, the copy and the 
original had different inode numbers.  That meant the userspace programs 
saw them as different files, and you could have different owners, 
attributes, etc., while keeping the data linked.  But that broke a 
common optimisation when doing large diff's - thus some people wanted to 
have the same inode for each file and that /definitely/ broke posix.

With btrfs, the file copies would each have their own inode - it would, 
I think, be posix compliant as it is transparent to user programs.  The 
diff optimisation discussed in the articles you sited would not work - 
but if btrfs becomes the standard Linux file system, then user 
applications like diff can be extended with btrfs-specific optimisations 
if necessary.

> I am not very technical and don't know if it's feasible in btrfs.

Nor am I very knowledgeable in this area (most of my programming is on 
8-bit processors), but I believe btrfs is already designed to support 
larger checksums (32-bit CRCs are not enough to say that data is 
identical), and the "cp --reflink" shows how the underlying link is made.

> I think most likely you'll have to run an userspace tool to find and
> merge identical files based on checksums (which already sounds good to me).

This sounds right to me.  In fact, it would be possible to do today, 
entirely from within user space - but files would need to be compared 
long-hand before merging.  With larger checksums, the userspace daemon 
would be much more efficient.

> The only thing we can ask the developers at the moment is if something
> like that would be possible without changes to the on-disk format.
>

I guess that's partly why I made these posts!

>
> PS. Another great scenario is shared hosting web/file servers: ten of
> thousand website with mostly the same tiny PHP Joomla files.
> If you can get the benefits of: compression + "content based"/cowlinks +
> FS Cache... That would really make Btrfs FLY on Hard Disk and make SSD
> devices possible for storage (because of the space efficiency).
>

That's a good point.

People often think that hard disk space is cheap these days - but being 
space efficient means you can use an SSD instead of a hard disk.  And 
for on-disk backups, it means you can use a small number of disks even 
though the users think "I've got a huge hard disk, I can make lots of 
copies of these files" !


  reply	other threads:[~2010-03-17  8:21 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-16  9:21 Content based storage David Brown
2010-03-16 22:45 ` Fabio
2010-03-17  8:21   ` David Brown [this message]
2010-03-17  0:45 ` Hubert Kario
2010-03-17  8:27   ` David Brown
2010-03-17  8:48     ` Heinz-Josef Claes
2010-03-17 15:25       ` Hubert Kario
2010-03-17 15:33         ` Leszek Ciesielski
2010-03-17 19:43           ` Hubert Kario
2010-03-20  2:46             ` Boyd Waters
2010-03-20 13:05               ` Ric Wheeler
2010-03-20 21:24                 ` Boyd Waters
2010-03-20 22:16                   ` Ric Wheeler
2010-03-20 22:44                     ` Ric Wheeler
2010-03-21  6:55                       ` Boyd Waters
2010-03-18 23:33   ` create debian package of btrfs kernel from git tree rk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='hnq3fa$700$1@dough.gmane.org' \
    --to=david@westcontrol.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).