All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ondřej Bílka" <neleai@seznam.cz>
To: Gordan Bobic <gordan@bobich.net>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Offline Deduplication for Btrfs
Date: Thu, 6 Jan 2011 16:37:57 +0100	[thread overview]
Message-ID: <20110106153757.GA14070@domone> (raw)
In-Reply-To: <4D25D498.4050709@bobich.net>

On Thu, Jan 06, 2011 at 02:41:28PM +0000, Gordan Bobic wrote:
> Ond=C5=99ej B=C3=ADlka wrote:
>=20
> >>>Then again, for a lot of use-cases there are perhaps better ways t=
o
> >>>achieve the targed goal than deduping on FS level, e.g. snapshotti=
ng or
> >>>something like fl-cow:
> >>>http://www.xmailserver.org/flcow.html
> >>>
> >As VM are concerned fl-cow is poor replacement of deduping.
>=20
> Depends on your VM. If your VM uses monolithic images, then you're
> right. For a better solution, take a look at vserver's hashify
> feature for something that does this very well in it's own context.
>=20
> >Upgrading packages? 1st vm upgrades and copies changed files.
> >After while second upgrades and copies files too. More and more beco=
mes duped again.
>=20
> So you want online dedupe, then. :)
>=20
> >If you host multiple distributions you need to translate
> >that /usr/share/bin/foo in foonux is /us/bin/bar in barux
>=20
> The chances of the binaries being the same between distros are
> between slim and none. In the context of VMs where you have access
> to raw files, as I said, look at vserver's hashify feature. It
> doesn't care about file names, it will COW hard-link all files with
> identical content. This doesn't even require an exhaustive check of
> all the files' contents - you can start with file sizes. Files that
> have different sizes can't have the same contents, so you can
> discard most of the comparing before you even open the file, most of
> the work gets done based on metadata alone.
>=20
Yes I wrote this as quick example. On second thought files shared=20
between distros are typicaly write-only(like manpages)

>
> >And primary reason to dedupe is not to reduce space usage but to
> >improve caching. Why should machine A read file if machine B read it=
 five minutes ago.
>=20
> Couldn't agree more. This is what I was trying to explain earlier.
> Even if deduping did cause more fragmentation (and I don't think
> that is the case to any significant extent), the improved caching
> efficiency would more than offset this.
>=20
> Gordan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs=
" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--=20

Program load too heavy for processor to lift.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-01-06 15:37 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-05 16:36 Offline Deduplication for Btrfs Josef Bacik
2011-01-05 16:36 ` [PATCH] Btrfs: add extent-same ioctl for dedup Josef Bacik
2011-01-05 17:50   ` Simon Farnsworth
2011-01-05 16:36 ` [PATCH] Btrfs-progs: add dedup functionality Josef Bacik
2011-01-05 17:42 ` Offline Deduplication for Btrfs Gordan Bobic
2011-01-05 18:41   ` Diego Calleja
2011-01-05 19:01     ` Ray Van Dolson
2011-01-05 20:27       ` Gordan Bobic
2011-01-05 20:28       ` Josef Bacik
2011-01-05 20:25     ` Gordan Bobic
2011-01-05 21:14       ` Diego Calleja
2011-01-05 21:21         ` Gordan Bobic
2011-01-05 19:46   ` Josef Bacik
2011-01-05 19:58     ` Lars Wirzenius
2011-01-05 20:15       ` Josef Bacik
2011-01-05 20:34         ` Freddie Cash
2011-01-05 21:07       ` Lars Wirzenius
2011-01-05 20:12     ` Freddie Cash
2011-01-05 20:46     ` Gordan Bobic
     [not found]       ` <4D250B3C.6010708@shiftmail.org>
2011-01-06  1:03         ` Gordan Bobic
2011-01-06  1:56           ` Spelic
2011-01-06 10:39             ` Gordan Bobic
2011-01-06  3:33           ` Freddie Cash
2011-01-06  1:19       ` Spelic
2011-01-06  3:58         ` Peter A
2011-01-06 10:48           ` Gordan Bobic
2011-01-06 13:33             ` Peter A
2011-01-06 14:00               ` Gordan Bobic
2011-01-06 14:52                 ` Peter A
2011-01-06 15:07                   ` Gordan Bobic
2011-01-06 16:11                     ` Peter A
2011-01-06 18:35           ` Chris Mason
2011-01-08  0:27             ` Peter A
2011-01-06 14:30         ` Tomasz Torcz
2011-01-06 14:49           ` Gordan Bobic
2011-01-06  1:29   ` Chris Mason
2011-01-06 10:33     ` Gordan Bobic
2011-01-10 15:28     ` Ric Wheeler
2011-01-10 15:37       ` Josef Bacik
2011-01-10 15:39         ` Chris Mason
2011-01-10 15:43           ` Josef Bacik
2011-01-06 12:18   ` Simon Farnsworth
2011-01-06 12:29     ` Gordan Bobic
2011-01-06 13:30       ` Simon Farnsworth
2011-01-06 14:20     ` Ondřej Bílka
2011-01-06 14:41       ` Gordan Bobic
2011-01-06 15:37         ` Ondřej Bílka [this message]
2011-01-06  8:25 ` Yan, Zheng 
  -- strict thread matches above, loose matches on Subject: below --
2011-01-06  9:37 Tomasz Chmielewski
2011-01-06  9:51 ` Mike Hommey
2011-01-06 16:57   ` Hubert Kario
2011-01-06 10:52 ` Gordan Bobic
2011-01-16  0:18 Arjen Nienhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110106153757.GA14070@domone \
    --to=neleai@seznam.cz \
    --cc=gordan@bobich.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.