* [OFFTOPIC] Hardlink utility - reclaim drive space @ 2001-03-02 19:03 William Stearns 2001-03-05 19:17 ` Padraig Brady 2001-03-07 14:52 ` David Woodhouse 0 siblings, 2 replies; 6+ messages in thread From: William Stearns @ 2001-03-02 19:03 UTC (permalink / raw) To: ML-linux-kernel; +Cc: William Stearns Good day, all, Sorry for the offtopic post; I sincerely believe this will be useful to developers with multiple copies of, say, the linux kernel tree on their drives. I'll be brief. Please followup to private mail - thanks. Freedups scans the directories you give it for identical files and hardlinks them together to save drive space. Please see ftp://ftp.stearns.org/pub/freedups . V0.2.1 is up there; it has received some testing, but may yet contain bugs. I was able to recover ~676M by running it against 8 different 2.4.x kernel trees with different patches that originally contained ~948M of files. YMMV. I do understand there are better ways to handle this problem (cp -av --link, cvs? Bitkeeper, deleting unneeded trees, tarring up trees, etc.). See the readme for a little discussion on this. This is just one approach that may be useful in some situations. Cheers, - Bill --------------------------------------------------------------------------- "Software is largely a service industry operating under the persistent but unfounded delusion that it is a manufacturing industry." -- Eric Raymond -------------------------------------------------------------------------- William Stearns (wstearns@pobox.com). Mason, Buildkernel, named2hosts, and ipfwadm2ipchains are at: http://www.pobox.com/~wstearns LinuxMonth; articles for Linux Enthusiasts! http://www.linuxmonth.com -------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [OFFTOPIC] Hardlink utility - reclaim drive space 2001-03-02 19:03 [OFFTOPIC] Hardlink utility - reclaim drive space William Stearns @ 2001-03-05 19:17 ` Padraig Brady 2001-03-05 19:19 ` Jeremy Jackson 2001-03-05 22:08 ` David Schleef 2001-03-07 14:52 ` David Woodhouse 1 sibling, 2 replies; 6+ messages in thread From: Padraig Brady @ 2001-03-05 19:17 UTC (permalink / raw) To: William Stearns; +Cc: ML-linux-kernel Hmm.. useful until you actually want to modify a linked file, but then your modifying the file in all "merged" trees. Wouldn't it be cool to have an extended attribute for files called "Copy on Write", so then you could hardlink all duplicate files together, but when a file is modified a copy is transparently created. Actually should it be called "Copy On Modify"? since if you copied a file there would be no need to make an actual copy until the file was modified. The only problem I see with this is that you wouldn't have enough space to store a copy of a file, what would you do in this case, just return an error on write? Is there any way this could be extended across filesystems? I suppose you could add it on top of existing DFS'? I could see many uses for this, like backup systems, but perhaps a block level system is more appropriate in this case? (like the just announced SnapFS). Is there any filesystem that supports this @ present? Padraig. William Stearns wrote: > Good day, all, > Sorry for the offtopic post; I sincerely believe this will be > useful to developers with multiple copies of, say, the linux kernel tree > on their drives. I'll be brief. Please followup to private mail - > thanks. > Freedups scans the directories you give it for identical files and > hardlinks them together to save drive space. Please see > ftp://ftp.stearns.org/pub/freedups . V0.2.1 is up there; it has received > some testing, but may yet contain bugs. > I was able to recover ~676M by running it against 8 different > 2.4.x kernel trees with different patches that originally contained ~948M > of files. YMMV. > I do understand there are better ways to handle this problem (cp > -av --link, cvs? Bitkeeper, deleting unneeded trees, tarring up trees, > etc.). See the readme for a little discussion on this. This is just one > approach that may be useful in some situations. > Cheers, > - Bill ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [OFFTOPIC] Hardlink utility - reclaim drive space 2001-03-05 19:17 ` Padraig Brady @ 2001-03-05 19:19 ` Jeremy Jackson 2001-03-06 13:57 ` Padraig Brady 2001-03-05 22:08 ` David Schleef 1 sibling, 1 reply; 6+ messages in thread From: Jeremy Jackson @ 2001-03-05 19:19 UTC (permalink / raw) To: Padraig Brady; +Cc: William Stearns, ML-linux-kernel Padraig Brady wrote: > Hmm.. useful until you actually want to modify a linked file, > but then your modifying the file in all "merged" trees. > Wouldn't it be cool to have an extended attribute > for files called "Copy on Write", so then you could > hardlink all duplicate files together, but when a file is > modified a copy is transparently created. > Actually should it be called "Copy On Modify"? since if > you copied a file there would be no need to make an actual > copy until the file was modified. > > The only problem I see with this is that you wouldn't have > enough space to store a copy of a file, what would you do > in this case, just return an error on write? > > Is there any way this could be extended across filesystems? > I suppose you could add it on top of existing DFS'? > > I could see many uses for this, like backup systems, but perhaps > a block level system is more appropriate in this case? > (like the just announced SnapFS). > > Is there any filesystem that supports this @ present? > > Padraig. > > William Stearns wrote: > > > Good day, all, > > Sorry for the offtopic post; I sincerely believe this will be > > useful to developers with multiple copies of, say, the linux kernel tree > > on their drives. I'll be brief. Please followup to private mail - > > thanks. > > Freedups scans the directories you give it for identical files and > > hardlinks them together to save drive space. Please see > > ftp://ftp.stearns.org/pub/freedups . V0.2.1 is up there; it has received > > some testing, but may yet contain bugs. > > I was able to recover ~676M by running it against 8 different > > 2.4.x kernel trees with different patches that originally contained ~948M > > of files. YMMV. > > I do understand there are better ways to handle this problem (cp > > -av --link, cvs? Bitkeeper, deleting unneeded trees, tarring up trees, > > etc.). See the readme for a little discussion on this. This is just one > > approach that may be useful in some situations. > > Cheers, > > - Bill snapFS might handle this - versioning, copy-on-write disk file clones... even finer grained: only modified blocks of a file are duplicated, not the entire file, and it does this in real-time. in the case of kernel, why not get the whole repository? CVS stores versions as diffs internally, saving space. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [OFFTOPIC] Hardlink utility - reclaim drive space 2001-03-05 19:19 ` Jeremy Jackson @ 2001-03-06 13:57 ` Padraig Brady 0 siblings, 0 replies; 6+ messages in thread From: Padraig Brady @ 2001-03-06 13:57 UTC (permalink / raw) To: Jeremy Jackson; +Cc: William Stearns, ML-linux-kernel Jeremy Jackson wrote: > Padraig Brady wrote: > >> Hmm.. useful until you actually want to modify a linked file, >> but then your modifying the file in all "merged" trees. >> Wouldn't it be cool to have an extended attribute >> for files called "Copy on Write", so then you could >> hardlink all duplicate files together, but when a file is >> modified a copy is transparently created. >> Actually should it be called "Copy On Modify"? since if >> you copied a file there would be no need to make an actual >> copy until the file was modified. >> >> The only problem I see with this is that you wouldn't have >> enough space to store a copy of a file, what would you do >> in this case, just return an error on write? >> >> Is there any way this could be extended across filesystems? >> I suppose you could add it on top of existing DFS'? >> >> I could see many uses for this, like backup systems, but perhaps >> a block level system is more appropriate in this case? >> (like the just announced SnapFS). >> >> Is there any filesystem that supports this @ present? >> >> Padraig. >> >> William Stearns wrote: >> >>> Good day, all, >>> Sorry for the offtopic post; I sincerely believe this will be >>> useful to developers with multiple copies of, say, the linux kernel tree >>> on their drives. I'll be brief. Please followup to private mail - >>> thanks. >>> Freedups scans the directories you give it for identical files and >>> hardlinks them together to save drive space. Please see >>> ftp://ftp.stearns.org/pub/freedups . V0.2.1 is up there; it has received >>> some testing, but may yet contain bugs. >>> I was able to recover ~676M by running it against 8 different >>> 2.4.x kernel trees with different patches that originally contained ~948M >>> of files. YMMV. >>> I do understand there are better ways to handle this problem (cp >>> -av --link, cvs? Bitkeeper, deleting unneeded trees, tarring up trees, >>> etc.). See the readme for a little discussion on this. This is just one >>> approach that may be useful in some situations. >>> Cheers, >>> - Bill >> > snapFS might handle this - versioning, copy-on-write disk file > clones... even finer grained: only modified blocks of a file are > duplicated, not the entire file, and it does this in real-time. Yes I mentioned snapFS above, and a block level system would be a win for large files, that are quite similar. However in my experience this is usually not the case, i.e. large files are usually not similar, so a simple file level system would be more appropriate im my opinion. Also I don't think user space progs should be relied on to manage hardlinked files by (effectively) doing: cp orig temp; mv temp orig You could use file permissions (chmod -w orig) to remind you to do this, but that's just a kludge, and also it's messy with every user space prog doing something different. Also the cp above breaks the link, which you wouldn't want to do until the file is actually modified. So if you implemented the "Copy On modify" extended attribute, you could set cp to cp -l by default. I'm talking about something more general here than working with a few similar trees. This is a general way to never have duplicate files on a filesystem. Doing this at the block level would be more fine grained, but at the cost of much more complexity and processing time, especially if you want to analyse an existing filesystem. If you do it @ the file level, you can just scan for duplicate files, merge them using hardlinks, and set the "Copy On Modify" bit. This can be cleared of course as appropriate, where you want the origonal hardlink behaviour. > in the case of kernel, why not get the whole repository? > CVS stores versions as diffs internally, saving space. Yep good for kernel where there are no binaries, but not good in general. Padraig. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [OFFTOPIC] Hardlink utility - reclaim drive space 2001-03-05 19:17 ` Padraig Brady 2001-03-05 19:19 ` Jeremy Jackson @ 2001-03-05 22:08 ` David Schleef 1 sibling, 0 replies; 6+ messages in thread From: David Schleef @ 2001-03-05 22:08 UTC (permalink / raw) To: Padraig Brady; +Cc: William Stearns, ML-linux-kernel [-- Attachment #1: Type: text/plain, Size: 373 bytes --] On Mon, Mar 05, 2001 at 07:17:18PM +0000, Padraig Brady wrote: > Hmm.. useful until you actually want to modify a linked file, > but then your modifying the file in all "merged" trees. Use emacs, because you can configure it to do something appropriate with linked files. But for those of us addicted to vi, the attached wrapper script is pretty cool, too. dave... [-- Attachment #2: cow-wrapper --] [-- Type: text/plain, Size: 707 bytes --] #!/bin/bash # # copy-on-write wrapper for hard linked files # Copyright 2000 David A. Schleef <ds@schleef.org> # # Please send me any improvments you make to this script. I just # wrote it as a quick and dirty hack. linkedfiles= for each in $* do case $each in -*) # ignore ;; *) if [ -f "$each" ];then nlinks=$(stat $each|grep Links|sed 's/.*Links: \(.*\)\{1\}/\1/') if [ $nlinks -gt 1 ];then #echo unlinking $each linkedfiles="$linkedfiles $each" mv $each $each.orig cp $each.orig $each fi fi ;; esac done /usr/bin/vim $* for each in $linkedfiles do if cmp $each $each.orig &>/dev/null then #echo relinking $each rm $each mv $each.orig $each fi done ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [OFFTOPIC] Hardlink utility - reclaim drive space 2001-03-02 19:03 [OFFTOPIC] Hardlink utility - reclaim drive space William Stearns 2001-03-05 19:17 ` Padraig Brady @ 2001-03-07 14:52 ` David Woodhouse 1 sibling, 0 replies; 6+ messages in thread From: David Woodhouse @ 2001-03-07 14:52 UTC (permalink / raw) To: Padraig Brady; +Cc: William Stearns, ML-linux-kernel Padraig@AnteFacto.com said: > Wouldn't it be cool to have an extended attribute for files called > "Copy on Write", so then you could hardlink all duplicate files > together, but when a file is modified a copy is transparently created. > The only problem I see with this is that you wouldn't have enough > space to store a copy of a file, what would you do in this case, just > return an error on write? Yep. write(2) is allowed to return -ENOSPC, even when you're not extending the file you're writing to. Think about holes and log-structured filesystems. -- dwmw2 ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2001-03-07 14:53 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-03-02 19:03 [OFFTOPIC] Hardlink utility - reclaim drive space William Stearns 2001-03-05 19:17 ` Padraig Brady 2001-03-05 19:19 ` Jeremy Jackson 2001-03-06 13:57 ` Padraig Brady 2001-03-05 22:08 ` David Schleef 2001-03-07 14:52 ` David Woodhouse
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox