* [linux-lvm] Data deduplication in LVM?
@ 2009-06-10 18:41 Roy Sigurd Karlsbakk
2009-06-10 18:48 ` Ray Van Dolson
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Roy Sigurd Karlsbakk @ 2009-06-10 18:41 UTC (permalink / raw)
To: linux-lvm
Hi all
I've been reading up a little about data deduplication, and have been
in search for an OSS filesystem with dedup without much luck. While
testing snapshots and so on in LVM, I started wondering if dedup would
be better off in LVM than in the filesystem. Would it be possible/
efficient to add dedup to the LVM layer, or perhaps a layer above LVM?
This could make dedup work for all or most of filesystems. Make a hash
table with 4k (or whatever) blocks, make virtual blocks pointing to
the physical blocks and run a remapping/deduping job at night. If
written to, copy-on-write could be used to increase speed.
Is this nonsense, or might it be an idea?
roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres
intelligibelt. Det er et element�rt imperativ for alle pedagoger �
unng� eksessiv anvendelse av idiomer med fremmed opprinnelse. I de
fleste tilfeller eksisterer adekvate og relevante synonymer p� norsk.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Data deduplication in LVM?
2009-06-10 18:41 [linux-lvm] Data deduplication in LVM? Roy Sigurd Karlsbakk
@ 2009-06-10 18:48 ` Ray Van Dolson
2009-06-10 18:54 ` Les Mikesell
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Ray Van Dolson @ 2009-06-10 18:48 UTC (permalink / raw)
To: linux-lvm
On Wed, Jun 10, 2009 at 11:41:52AM -0700, Roy Sigurd Karlsbakk wrote:
> Hi all
>
> I've been reading up a little about data deduplication, and have been
> in search for an OSS filesystem with dedup without much luck. While
> testing snapshots and so on in LVM, I started wondering if dedup would
> be better off in LVM than in the filesystem. Would it be possible/
> efficient to add dedup to the LVM layer, or perhaps a layer above LVM?
> This could make dedup work for all or most of filesystems. Make a hash
> table with 4k (or whatever) blocks, make virtual blocks pointing to
> the physical blocks and run a remapping/deduping job at night. If
> written to, copy-on-write could be used to increase speed.
>
> Is this nonsense, or might it be an idea?
>
I like the idea. :-) Maybe it could be done at the LV layer.
Ray
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Data deduplication in LVM?
2009-06-10 18:41 [linux-lvm] Data deduplication in LVM? Roy Sigurd Karlsbakk
2009-06-10 18:48 ` Ray Van Dolson
@ 2009-06-10 18:54 ` Les Mikesell
2009-06-10 18:59 ` Roy Sigurd Karlsbakk
2009-06-10 19:04 ` Roy Sigurd Karlsbakk
2009-06-10 22:30 ` Stuart D. Gathman
3 siblings, 1 reply; 12+ messages in thread
From: Les Mikesell @ 2009-06-10 18:54 UTC (permalink / raw)
To: LVM general discussion and development
Roy Sigurd Karlsbakk wrote:
> Hi all
>
> I've been reading up a little about data deduplication, and have been in
> search for an OSS filesystem with dedup without much luck. While testing
> snapshots and so on in LVM, I started wondering if dedup would be better
> off in LVM than in the filesystem. Would it be possible/efficient to add
> dedup to the LVM layer, or perhaps a layer above LVM? This could make
> dedup work for all or most of filesystems. Make a hash table with 4k (or
> whatever) blocks, make virtual blocks pointing to the physical blocks
> and run a remapping/deduping job at night. If written to, copy-on-write
> could be used to increase speed.
>
> Is this nonsense, or might it be an idea?
This is "supposed" to be coming in the next OpenSolaris/ZFS release (per
the roadmap with the just-released 2009.06 version).
--
Les Mikesell
lesmikesell@gmail.com
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Data deduplication in LVM?
2009-06-10 18:54 ` Les Mikesell
@ 2009-06-10 18:59 ` Roy Sigurd Karlsbakk
2009-06-10 19:30 ` Les Mikesell
0 siblings, 1 reply; 12+ messages in thread
From: Roy Sigurd Karlsbakk @ 2009-06-10 18:59 UTC (permalink / raw)
To: LVM general discussion and development
On 10. juni. 2009, at 20.54, Les Mikesell wrote:
> Roy Sigurd Karlsbakk wrote:
>> Hi all
>> I've been reading up a little about data deduplication, and have
>> been in search for an OSS filesystem with dedup without much luck.
>> While testing snapshots and so on in LVM, I started wondering if
>> dedup would be better off in LVM than in the filesystem. Would it
>> be possible/efficient to add dedup to the LVM layer, or perhaps a
>> layer above LVM? This could make dedup work for all or most of
>> filesystems. Make a hash table with 4k (or whatever) blocks, make
>> virtual blocks pointing to the physical blocks and run a remapping/
>> deduping job at night. If written to, copy-on-write could be used
>> to increase speed.
>> Is this nonsense, or might it be an idea?
>
> This is "supposed" to be coming in the next OpenSolaris/ZFS release
> (per the roadmap with the just-released 2009.06 version).
What about Linux/LVM? Or did I misunderstand you?
roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres
intelligibelt. Det er et element�rt imperativ for alle pedagoger �
unng� eksessiv anvendelse av idiomer med fremmed opprinnelse. I de
fleste tilfeller eksisterer adekvate og relevante synonymer p� norsk.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Data deduplication in LVM?
2009-06-10 18:41 [linux-lvm] Data deduplication in LVM? Roy Sigurd Karlsbakk
2009-06-10 18:48 ` Ray Van Dolson
2009-06-10 18:54 ` Les Mikesell
@ 2009-06-10 19:04 ` Roy Sigurd Karlsbakk
2009-06-10 22:30 ` Stuart D. Gathman
3 siblings, 0 replies; 12+ messages in thread
From: Roy Sigurd Karlsbakk @ 2009-06-10 19:04 UTC (permalink / raw)
To: LVM general discussion and development
On 10. juni. 2009, at 20.41, Roy Sigurd Karlsbakk wrote:
> Hi all
>
> I've been reading up a little about data deduplication, and have
> been in search for an OSS filesystem with dedup without much luck.
> While testing snapshots and so on in LVM, I started wondering if
> dedup would be better off in LVM than in the filesystem. Would it be
> possible/efficient to add dedup to the LVM layer, or perhaps a layer
> above LVM? This could make dedup work for all or most of
> filesystems. Make a hash table with 4k (or whatever) blocks, make
> virtual blocks pointing to the physical blocks and run a remapping/
> deduping job at night. If written to, copy-on-write could be used to
> increase speed.
Answering myself, it seems there can be a problem with this without a
rather large change in the APIs. If I understand it correctly, if
metadata is deduplicated, it may impose a rather large performance
impact on writes, and from the block layer, how do you know what's
metadata and what's not?
roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres
intelligibelt. Det er et element�rt imperativ for alle pedagoger �
unng� eksessiv anvendelse av idiomer med fremmed opprinnelse. I de
fleste tilfeller eksisterer adekvate og relevante synonymer p� norsk.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Data deduplication in LVM?
2009-06-10 18:59 ` Roy Sigurd Karlsbakk
@ 2009-06-10 19:30 ` Les Mikesell
2009-06-10 19:33 ` Brian J. Murrell
2009-06-10 19:34 ` Ray Van Dolson
0 siblings, 2 replies; 12+ messages in thread
From: Les Mikesell @ 2009-06-10 19:30 UTC (permalink / raw)
To: LVM general discussion and development
Roy Sigurd Karlsbakk wrote:
>
>> Roy Sigurd Karlsbakk wrote:
>>> Hi all
>>> I've been reading up a little about data deduplication, and have been
>>> in search for an OSS filesystem with dedup without much luck. While
>>> testing snapshots and so on in LVM, I started wondering if dedup
>>> would be better off in LVM than in the filesystem. Would it be
>>> possible/efficient to add dedup to the LVM layer, or perhaps a layer
>>> above LVM? This could make dedup work for all or most of filesystems.
>>> Make a hash table with 4k (or whatever) blocks, make virtual blocks
>>> pointing to the physical blocks and run a remapping/deduping job at
>>> night. If written to, copy-on-write could be used to increase speed.
>>> Is this nonsense, or might it be an idea?
>>
>> This is "supposed" to be coming in the next OpenSolaris/ZFS release
>> (per the roadmap with the just-released 2009.06 version).
>
>
> What about Linux/LVM? Or did I misunderstand you?
I thought the question was about OSS... I wouldn't hold my breath
waiting for a Linux/LVM version - and for that matter I'll believe the
ZFS release when I see it, but at least it is being planned and could be
less than a year away.
--
Les Mikesell
lesmikesell@gmail.com
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Data deduplication in LVM?
2009-06-10 19:30 ` Les Mikesell
@ 2009-06-10 19:33 ` Brian J. Murrell
2009-06-10 19:34 ` Ray Van Dolson
1 sibling, 0 replies; 12+ messages in thread
From: Brian J. Murrell @ 2009-06-10 19:33 UTC (permalink / raw)
To: LVM general discussion and development
[-- Attachment #1: Type: text/plain, Size: 257 bytes --]
On Wed, 2009-06-10 at 14:30 -0500, Les Mikesell wrote:
>
> I thought the question was about OSS...
FWIW, ZFS is OSS. It's licence not compatible with Linux and the GPL
(sadly), but that that doesn't mean that it's not Open Source Software.
b.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Data deduplication in LVM?
2009-06-10 19:30 ` Les Mikesell
2009-06-10 19:33 ` Brian J. Murrell
@ 2009-06-10 19:34 ` Ray Van Dolson
1 sibling, 0 replies; 12+ messages in thread
From: Ray Van Dolson @ 2009-06-10 19:34 UTC (permalink / raw)
To: linux-lvm
On Wed, Jun 10, 2009 at 12:30:25PM -0700, Les Mikesell wrote:
> Roy Sigurd Karlsbakk wrote:
> >
> >> Roy Sigurd Karlsbakk wrote:
> >>> Hi all
> >>> I've been reading up a little about data deduplication, and have been
> >>> in search for an OSS filesystem with dedup without much luck. While
> >>> testing snapshots and so on in LVM, I started wondering if dedup
> >>> would be better off in LVM than in the filesystem. Would it be
> >>> possible/efficient to add dedup to the LVM layer, or perhaps a layer
> >>> above LVM? This could make dedup work for all or most of filesystems.
> >>> Make a hash table with 4k (or whatever) blocks, make virtual blocks
> >>> pointing to the physical blocks and run a remapping/deduping job at
> >>> night. If written to, copy-on-write could be used to increase speed.
> >>> Is this nonsense, or might it be an idea?
> >>
> >> This is "supposed" to be coming in the next OpenSolaris/ZFS release
> >> (per the roadmap with the just-released 2009.06 version).
> >
> >
> > What about Linux/LVM? Or did I misunderstand you?
>
> I thought the question was about OSS... I wouldn't hold my breath
> waiting for a Linux/LVM version - and for that matter I'll believe the
> ZFS release when I see it, but at least it is being planned and could be
> less than a year away.
>
Also, both btrfs and tux3 are planning on adding dedup support.
Ray
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Data deduplication in LVM?
2009-06-10 18:41 [linux-lvm] Data deduplication in LVM? Roy Sigurd Karlsbakk
` (2 preceding siblings ...)
2009-06-10 19:04 ` Roy Sigurd Karlsbakk
@ 2009-06-10 22:30 ` Stuart D. Gathman
2009-06-11 10:19 ` Roy Sigurd Karlsbakk
3 siblings, 1 reply; 12+ messages in thread
From: Stuart D. Gathman @ 2009-06-10 22:30 UTC (permalink / raw)
To: LVM general discussion and development
On Wed, 10 Jun 2009, Roy Sigurd Karlsbakk wrote:
> Is this nonsense, or might it be an idea?
It's an idea. With loosely coupled distributed computing, deduplication on
the nodes is not all that helpful, since each node needs its own copy anyway.
However, it is very helpful for backup. One OSS backup product that does
deduplication is BackupPC (written in Perl). In the backup server, every file
gets hard linked to a name in a special directory that is its md5 checksum
(plus some fiddly logic to handle metadata). Handling the metadata separately
also lets the backup repository run as an ordinary user, yet reuse the
OS filesystem to store the files.
Another product that implements its own datastore is Box Backup (written in C).
--
Stuart D. Gathman <stuart@bmsi.com>
Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Data deduplication in LVM?
2009-06-10 22:30 ` Stuart D. Gathman
@ 2009-06-11 10:19 ` Roy Sigurd Karlsbakk
2009-06-11 12:30 ` Les Mikesell
0 siblings, 1 reply; 12+ messages in thread
From: Roy Sigurd Karlsbakk @ 2009-06-11 10:19 UTC (permalink / raw)
To: LVM general discussion and development
On 11. juni. 2009, at 00.30, Stuart D. Gathman wrote:
> One OSS backup product that does
> deduplication is BackupPC (written in Perl). In the backup server,
> every file
> gets hard linked to a name in a special directory that is its md5
> checksum
> (plus some fiddly logic to handle metadata)
This sounds like file-level deduplication. Most storage systems sing
dedup, uses block-level dedup. NetApp is one example; they dedup
everything with 4k blocks, doing the actual deduplication at night.
roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres
intelligibelt. Det er et element�rt imperativ for alle pedagoger �
unng� eksessiv anvendelse av idiomer med fremmed opprinnelse. I de
fleste tilfeller eksisterer adekvate og relevante synonymer p� norsk.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Data deduplication in LVM?
2009-06-11 10:19 ` Roy Sigurd Karlsbakk
@ 2009-06-11 12:30 ` Les Mikesell
2009-06-11 15:35 ` Les Mikesell
0 siblings, 1 reply; 12+ messages in thread
From: Les Mikesell @ 2009-06-11 12:30 UTC (permalink / raw)
To: LVM general discussion and development
Roy Sigurd Karlsbakk wrote:
> On 11. juni. 2009, at 00.30, Stuart D. Gathman wrote:
>
>> One OSS backup product that does
>> deduplication is BackupPC (written in Perl). In the backup server,
>> every file
>> gets hard linked to a name in a special directory that is its md5
>> checksum
>> (plus some fiddly logic to handle metadata)
>
>
> This sounds like file-level deduplication. Most storage systems sing
> dedup, uses block-level dedup. NetApp is one example; they dedup
> everything with 4k blocks, doing the actual deduplication at night.
Yes, it is a different concept. However it does work very well when you
are storing your backups on a filesystem without block-level dedup. And
that is probably the place where you have the most redundancy - or if
you don't already, you'll be able to store a much longer history.
--
Les Mikesell
lesmikesell@gmail.com
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [linux-lvm] Data deduplication in LVM?
2009-06-11 12:30 ` Les Mikesell
@ 2009-06-11 15:35 ` Les Mikesell
0 siblings, 0 replies; 12+ messages in thread
From: Les Mikesell @ 2009-06-11 15:35 UTC (permalink / raw)
To: LVM general discussion and development
Les Mikesell wrote:
> Roy Sigurd Karlsbakk wrote:
>> On 11. juni. 2009, at 00.30, Stuart D. Gathman wrote:
>>
>>> One OSS backup product that does
>>> deduplication is BackupPC (written in Perl). In the backup server,
>>> every file
>>> gets hard linked to a name in a special directory that is its md5
>>> checksum
>>> (plus some fiddly logic to handle metadata)
>>
>>
>> This sounds like file-level deduplication. Most storage systems sing
>> dedup, uses block-level dedup. NetApp is one example; they dedup
>> everything with 4k blocks, doing the actual deduplication at night.
>
> Yes, it is a different concept. However it does work very well when you
> are storing your backups on a filesystem without block-level dedup. And
> that is probably the place where you have the most redundancy - or if
> you don't already, you'll be able to store a much longer history.
Apologies for following up my own post, but this does remind me of a
slightly related problem that someone here might have solved. The
backuppc archive ends up containing such a large number of directory
entries and hardlinks that it is typically impractical to copy by any
file-oriented means or even rsync. A recurring topic on the backuppc
mail list is how to make a copy for offsite storage.
Personally I use a RAID1 created with 3 mirror members and periodically
swap one out and resync, but that's not very elegant. Is there a better
way or one that could be incrementally updated across a WAN? Does LVM
have a mechanism like zfs's incremental snapshot send/receive? (Not sure
if that would work either but it sounds promising). Is there any other
way to do a block-oriented remote copy? Would LVM mirroring work as
well or better than md-device raid? The partition can stay mounted
while the raid rebuilds but realistically not much else can be happening
because of the performance impact, and I unmount momentarily while
removing the member to get a clean filesystem.
Are there tricks with drbd or perhaps raid over iscsi that would let a
periodic sync work incrementally - well enough to use over a WAN?
--
Les Mikesell
lesmikesell@gmail.com
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2009-06-11 15:35 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-10 18:41 [linux-lvm] Data deduplication in LVM? Roy Sigurd Karlsbakk
2009-06-10 18:48 ` Ray Van Dolson
2009-06-10 18:54 ` Les Mikesell
2009-06-10 18:59 ` Roy Sigurd Karlsbakk
2009-06-10 19:30 ` Les Mikesell
2009-06-10 19:33 ` Brian J. Murrell
2009-06-10 19:34 ` Ray Van Dolson
2009-06-10 19:04 ` Roy Sigurd Karlsbakk
2009-06-10 22:30 ` Stuart D. Gathman
2009-06-11 10:19 ` Roy Sigurd Karlsbakk
2009-06-11 12:30 ` Les Mikesell
2009-06-11 15:35 ` Les Mikesell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).