From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx3.redhat.com (mx3.redhat.com [172.16.48.32]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n5OJRUP7003850 for ; Wed, 24 Jun 2009 15:27:31 -0400 Received: from mail.ilcampo.com (mail.ilcampo.com [193.172.126.47]) by mx3.redhat.com (8.13.8/8.13.8) with ESMTP id n5OJR86e023483 for ; Wed, 24 Jun 2009 15:27:09 -0400 Received: from unknown (HELO [172.16.2.132]) (mark.ruijter_siennax@[193.172.126.230]) (envelope-sender ) by mail.ilcampo.com (qmail-ldap-1.03) with SMTP for ; 24 Jun 2009 19:27:08 -0000 Message-ID: <4A427DC4.7050805@gmail.com> Date: Wed, 24 Jun 2009 21:25:56 +0200 From: Mark Ruijter MIME-Version: 1.0 Subject: Re: [linux-lvm] Data deduplication for Linux : lessfs References: <4A42424B.1080208@gmail.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------090604030903040506090309" Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: To: LVM general discussion and development This is a multi-part message in MIME format. --------------090604030903040506090309 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Roy, > > It's a good idea, but given the current traffic on the lessfs mailing > list, I'm not sure if much work is done. I have been a member of that > list since June 1 and haven't received more than one message, which > was the one I wrote myself. > Almost all the traffic is on the forum - open discussion. Only one person posted to the mailing list. ;-) > If done smartly, this may perhaps be possible, but the problem is the > filesystem's metadata. Is this going to be dedup'ed? How much will > this take? A simple backup will update atime on all the files backed > up, and although atime isn't always wanted or needed, the problem > occurs elsewhere. Typically the meta data on production systems is approx 10%~20% of the deduplicated stored data. Stored data is on my systems 40x less then the data written to the filesystem. For example, from a real life backup server making dozens of backups each day: # df -h Filesystem Size Used Avail Use% Mounted on /dev/cciss/c0d0p3 9.7G 2.4G 6.9G 26% / /dev/cciss/c0d0p1 99M 23M 72M 24% /boot tmpfs 7.9G 0 7.9G 0% /dev/shm /dev/cciss/c0d0p4 246G 6.0G 241G 3% /meta /dev/cciss/c0d1p1 274G 73G 202G 27% /blockdata /dev/cciss/c1d0p1 4.1T 1.5T 2.7T 35% /data lessfs 4.1T 1.5T 2.7T 35% /pooldata [root@lessfssrv pooldata]# du . -s -h 31T . [root@lessfssrv pooldata]# ls -alh /data/current/ total 314G drwxr-xr-x 2 root root 26 Jun 1 00:12 . drwxr-xr-x 6 root root 59 Jun 1 00:12 .. -rw-r--r-- 1 root root 314G Jun 22 14:26 blockdata.tch [root@lessfssrv pooldata]# ls -alh /meta/current/ total 1.4G drwxr-xr-x 2 root root 63 Jun 1 00:12 . drwxr-xr-x 6 root root 59 Jun 1 00:12 .. -rw-r--r-- 1 root root 1.3G Jun 22 14:52 blockusage.tch -rw-r--r-- 1 root root 89M Jun 22 14:45 dirent.tcb -rw-r--r-- 1 root root 89M Jun 22 14:52 metadata.tcb Mark. > > > roy > --=20 > Roy Sigurd Karlsbakk > (+47) 97542685 > roy@karlsbakk.net > http://blogg.karlsbakk.net/ > --=20 > I all pedagogikk er det essensielt at pensum presenteres > intelligibelt. Det er et element=EF=BF=BDrt imperativ for alle pedagoger = =EF=BF=BD > unng=EF=BF=BD eksessiv anvendelse av idiomer med fremmed opprinnelse. I de > fleste tilfeller eksisterer adekvate og relevante synonymer p=EF=BF=BD no= rsk. > > > _______________________________________________ > linux-lvm mailing list > linux-lvm@redhat.com > https://www.redhat.com/mailman/listinfo/linux-lvm > read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/ --------------090604030903040506090309 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Hi Roy,

It's a good idea, but given the current traffic on the lessfs mailing list, I'm not sure if much work is done. I have been a member of that list since June 1 and haven't received more than one message, which was the one I wrote myself.

Almost all the traffic is on the forum - open discussion.
Only one person posted to the mailing list. ;-)

If done smartly, this may perhaps be possible, but the problem is the filesystem's metadata. Is this going to be dedup'ed? How much will this take? A simple backup will update atime on all the files backed up, and although atime isn't always wanted or needed, the problem occurs elsewhere.
Typically the meta data on production systems is approx 10%~20% of the deduplicated stored data.
Stored data is on my systems 40x less then the data written to the filesystem.

For example, from a real life backup server making dozens of backups each day:
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/cciss/c0d0p3     9.7G  2.4G  6.9G  26% /
/dev/cciss/c0d0p1      99M   23M   72M  24% /boot
tmpfs                 7.9G     0  7.9G   0% /dev/shm
/dev/cciss/c0d0p4     246G  6.0G  241G   3% /meta
/dev/cciss/c0d1p1     274G   73G  202G  27% /blockdata
/dev/cciss/c1d0p1     4.1T  1.5T  2.7T  35% /data
lessfs                4.1T  1.5T  2.7T  35% /pooldata
[root@lessfssrv pooldata]# du . -s -h
31T     .
[root@lessfssrv pooldata]# ls -alh /data/current/
total 314G
drwxr-xr-x 2 root root   26 Jun  1 00:12 .
drwxr-xr-x 6 root root   59 Jun  1 00:12 ..
-rw-r--r-- 1 root root 314G Jun 22 14:26 blockdata.tch
[root@lessfssrv pooldata]# ls -alh /meta/current/
total 1.4G
drwxr-xr-x 2 root root   63 Jun  1 00:12 .
drwxr-xr-x 6 root root   59 Jun  1 00:12 ..
-rw-r--r-- 1 root root 1.3G Jun 22 14:52 blockusage.tch
-rw-r--r-- 1 root root  89M Jun 22 14:45 dirent.tcb
-rw-r--r-- 1 root root  89M Jun 22 14:52 metadata.tcb


Mark.


roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
roy@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.


_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

--------------090604030903040506090309--