From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (mx1.redhat.com [172.16.48.31]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n5BFZQqd006843 for ; Thu, 11 Jun 2009 11:35:26 -0400 Received: from mailmx.futuresource.com (mailmx.futuresource.com [208.10.26.74]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id n5BFZCs6012626 for ; Thu, 11 Jun 2009 11:35:13 -0400 Received: from ns1.futuresource.com ([10.207.192.125]) by mailmx.futuresource.com (8.13.8/8.13.8) with ESMTP id n5BFZA9w011687 for ; Thu, 11 Jun 2009 10:35:12 -0500 Received: from [10.207.193.131] (xplmikesell.esignalcorp.com [10.207.193.131]) by ns1.futuresource.com (8.11.6/8.11.6) with ESMTP id n5BFZ9r30709 for ; Thu, 11 Jun 2009 10:35:10 -0500 Message-ID: <4A31242D.5090501@gmail.com> Date: Thu, 11 Jun 2009 10:35:09 -0500 From: Les Mikesell MIME-Version: 1.0 Subject: Re: [linux-lvm] Data deduplication in LVM? References: <4855BFEA-C772-4B98-A18E-C406FD5737DD@karlsbakk.net> <0CCC4321-943D-406A-A1BB-48E2BD6B0857@karlsbakk.net> <4A30F8E3.5090508@gmail.com> In-Reply-To: <4A30F8E3.5090508@gmail.com> Content-Transfer-Encoding: 7bit Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: LVM general discussion and development Les Mikesell wrote: > Roy Sigurd Karlsbakk wrote: >> On 11. juni. 2009, at 00.30, Stuart D. Gathman wrote: >> >>> One OSS backup product that does >>> deduplication is BackupPC (written in Perl). In the backup server, >>> every file >>> gets hard linked to a name in a special directory that is its md5 >>> checksum >>> (plus some fiddly logic to handle metadata) >> >> >> This sounds like file-level deduplication. Most storage systems sing >> dedup, uses block-level dedup. NetApp is one example; they dedup >> everything with 4k blocks, doing the actual deduplication at night. > > Yes, it is a different concept. However it does work very well when you > are storing your backups on a filesystem without block-level dedup. And > that is probably the place where you have the most redundancy - or if > you don't already, you'll be able to store a much longer history. Apologies for following up my own post, but this does remind me of a slightly related problem that someone here might have solved. The backuppc archive ends up containing such a large number of directory entries and hardlinks that it is typically impractical to copy by any file-oriented means or even rsync. A recurring topic on the backuppc mail list is how to make a copy for offsite storage. Personally I use a RAID1 created with 3 mirror members and periodically swap one out and resync, but that's not very elegant. Is there a better way or one that could be incrementally updated across a WAN? Does LVM have a mechanism like zfs's incremental snapshot send/receive? (Not sure if that would work either but it sounds promising). Is there any other way to do a block-oriented remote copy? Would LVM mirroring work as well or better than md-device raid? The partition can stay mounted while the raid rebuilds but realistically not much else can be happening because of the performance impact, and I unmount momentarily while removing the member to get a clean filesystem. Are there tricks with drbd or perhaps raid over iscsi that would let a periodic sync work incrementally - well enough to use over a WAN? -- Les Mikesell lesmikesell@gmail.com