* Git as a filesystem @ 2007-09-21 10:48 Peter Stahlir 2007-09-21 11:28 ` Muhammad Tayyab 0 siblings, 1 reply; 27+ messages in thread From: Peter Stahlir @ 2007-09-21 10:48 UTC (permalink / raw) To: linux-kernel Hi! Is it possible/feasible to use git as a filesystem? Like having git on top of ext3. This way I could do a gitfs-gc and there is only one pack file sitting on the disk which is a compressed version of the whole system. I am not interested in a version controlled filesystem, only in the space saving aspects. Thanks, Peter ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 10:48 Git as a filesystem Peter Stahlir @ 2007-09-21 11:28 ` Muhammad Tayyab 2007-09-21 11:37 ` Peter Stahlir 0 siblings, 1 reply; 27+ messages in thread From: Muhammad Tayyab @ 2007-09-21 11:28 UTC (permalink / raw) To: Peter Stahlir; +Cc: linux-kernel Hi, I think it would be a bad idea to use Git as a part of filesystem. If someone wants to install it and use it, its his choice, but if we make it the part of Filesystem, and just use it for compression, this will reduce the performance. For compression, i think more preferable is to make a patch for ext3 that implements the compression, like compression patch for ext2. On the other hand I liked the idea that we can have a Git based file system with special (non-standard) features. Why not provide the features like repository and other Git features in normal file system. Thanks, -- Tayyab Peter Stahlir wrote: > Hi! > > Is it possible/feasible to use git as a filesystem? > Like having git on top of ext3. > > This way I could do a gitfs-gc and there is only one > pack file sitting on the disk which is a compressed > version of the whole system. > I am not interested in a version controlled filesystem, > only in the space saving aspects. > > Thanks, > > Peter > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 11:28 ` Muhammad Tayyab @ 2007-09-21 11:37 ` Peter Stahlir 2007-09-21 11:46 ` Peter Stahlir 2007-09-21 15:04 ` Adrian Bunk 0 siblings, 2 replies; 27+ messages in thread From: Peter Stahlir @ 2007-09-21 11:37 UTC (permalink / raw) To: Muhammad Tayyab; +Cc: linux-kernel > Peter Stahlir wrote: > > Hi! > > > > Is it possible/feasible to use git as a filesystem? > > Like having git on top of ext3. > > > > This way I could do a gitfs-gc and there is only one > > pack file sitting on the disk which is a compressed > > version of the whole system. > > I am not interested in a version controlled filesystem, > > only in the space saving aspects. 2007/9/21, Muhammad Tayyab <mail.tayyab@gmail.com>: > Hi, > I think it would be a bad idea to use Git as a part of filesystem. If > someone wants to install it and use it, its his choice, but if we make > it the part of Filesystem, and just use it for compression, this will > reduce the performance. > For compression, i think more preferable is to make a patch for ext3 > that implements the compression, like compression patch for ext2. As I understand it the compression patches for ext2 only compress a single file. I think gitfs would compress much better because it deltifies between all files. So if you are not interested in performance but space efficiency a gitfs would be nice. Peter P.S.: I was told that there exists a fuse based gitfs at http://www.sfgoth.com/~mitch/linux/gitfs/ But I think our goals differ. files. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 11:37 ` Peter Stahlir @ 2007-09-21 11:46 ` Peter Stahlir 2007-09-21 12:51 ` Jan Engelhardt 2007-09-21 15:04 ` Adrian Bunk 1 sibling, 1 reply; 27+ messages in thread From: Peter Stahlir @ 2007-09-21 11:46 UTC (permalink / raw) To: Muhammad Tayyab; +Cc: linux-kernel For example, imagine running a complete Debian mirror on top of a Debian system with gitfs. How big would the packfile be for this 252GB beast? Peter ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 11:46 ` Peter Stahlir @ 2007-09-21 12:51 ` Jan Engelhardt 2007-09-21 13:30 ` Peter Stahlir 0 siblings, 1 reply; 27+ messages in thread From: Jan Engelhardt @ 2007-09-21 12:51 UTC (permalink / raw) To: Peter Stahlir; +Cc: Muhammad Tayyab, linux-kernel On Sep 21 2007 13:46, Peter Stahlir wrote: > >For example, imagine running a complete Debian mirror on top of a >Debian system with gitfs. How big would the packfile be for this 252GB >beast? Probably 252 GB. Lots of the packages are already compressed, and each time a minimal change is done, the bytestream changes, so long story short, deltifying between to compressed streams is likely to deltify horribly. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 12:51 ` Jan Engelhardt @ 2007-09-21 13:30 ` Peter Stahlir 2007-09-21 13:53 ` Jan Engelhardt 0 siblings, 1 reply; 27+ messages in thread From: Peter Stahlir @ 2007-09-21 13:30 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Muhammad Tayyab, linux-kernel > >For example, imagine running a complete Debian mirror on top of a > >Debian system with gitfs. How big would the packfile be for this 252GB > >beast? > > Probably 252 GB. Lots of the packages are already compressed, and > each time a minimal change is done, the bytestream changes, so long > story short, deltifying between to compressed streams is likely to > deltify horribly. What about adding deb or tar support to git? Then git doesn't store deb archives but the contents of archives. This way redundancy across architectures can be deltified. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 13:30 ` Peter Stahlir @ 2007-09-21 13:53 ` Jan Engelhardt 0 siblings, 0 replies; 27+ messages in thread From: Jan Engelhardt @ 2007-09-21 13:53 UTC (permalink / raw) To: Peter Stahlir; +Cc: Muhammad Tayyab, linux-kernel On Sep 21 2007 15:30, Peter Stahlir wrote: > >> >For example, imagine running a complete Debian mirror on top of a >> >Debian system with gitfs. How big would the packfile be for this 252GB >> >beast? >> >> Probably 252 GB. Lots of the packages are already compressed, and >> each time a minimal change is done, the bytestream changes, so long >> story short, deltifying between to compressed streams is likely to >> deltify horribly. > >What about adding deb or tar support to git? Blatant layering violation. >Then git doesn't store deb archives but the contents of archives. And waht about metadata (e.g. rpm Vendor: tag)? tar does not store that. >This way redundancy across architectures can be deltified. Not at all. Different instruction sets, different codes ==> delta -> 0. >- >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 11:37 ` Peter Stahlir 2007-09-21 11:46 ` Peter Stahlir @ 2007-09-21 15:04 ` Adrian Bunk 1 sibling, 0 replies; 27+ messages in thread From: Adrian Bunk @ 2007-09-21 15:04 UTC (permalink / raw) To: Peter Stahlir; +Cc: Muhammad Tayyab, linux-kernel On Fri, Sep 21, 2007 at 01:37:33PM +0200, Peter Stahlir wrote: > > Peter Stahlir wrote: > > > Hi! > > > > > > Is it possible/feasible to use git as a filesystem? > > > Like having git on top of ext3. > > > > > > This way I could do a gitfs-gc and there is only one > > > pack file sitting on the disk which is a compressed > > > version of the whole system. > > > I am not interested in a version controlled filesystem, > > > only in the space saving aspects. > > > 2007/9/21, Muhammad Tayyab <mail.tayyab@gmail.com>: > > Hi, > > I think it would be a bad idea to use Git as a part of filesystem. If > > someone wants to install it and use it, its his choice, but if we make > > it the part of Filesystem, and just use it for compression, this will > > reduce the performance. > > For compression, i think more preferable is to make a patch for ext3 > > that implements the compression, like compression patch for ext2. > > As I understand it the compression patches for ext2 only compress > a single file. I think gitfs would compress much better because it > deltifies between all files. So if you are not interested in > performance but space > efficiency a gitfs would be nice. git is kewl, so it must be the solution for all problems? git keeps information of all versions of a file. This can by definition not be smaller than only storing the current version. And git doesn't perform any magic, it uses the zlib library that is neither unusual nor the best compression method available. > Peter >... cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 27+ messages in thread
* Git as a filesystem @ 2007-09-21 10:51 Peter Stahlir 2007-09-21 11:11 ` Johannes Schindelin 0 siblings, 1 reply; 27+ messages in thread From: Peter Stahlir @ 2007-09-21 10:51 UTC (permalink / raw) To: git Hi! Is it possible/feasible to use git as a filesystem? Like having git on top of ext3. This way I could do a gitfs-gc and there is only one pack file sitting on the disk which is a compressed version of the whole system. I am not interested in a version controlled filesystem, only in the space saving aspects. Thanks, Peter ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 10:51 Peter Stahlir @ 2007-09-21 11:11 ` Johannes Schindelin 2007-09-21 11:41 ` Peter Stahlir 2007-09-21 14:22 ` Miklos Vajna 0 siblings, 2 replies; 27+ messages in thread From: Johannes Schindelin @ 2007-09-21 11:11 UTC (permalink / raw) To: Peter Stahlir; +Cc: git Hi, On Fri, 21 Sep 2007, Peter Stahlir wrote: > Is it possible/feasible to use git as a filesystem? > Like having git on top of ext3. I haven't looked at it closely, but there is a GitFS: http://git.or.cz/gitwiki/InterfacesFrontendsAndTools#head-f354b40618742b976c13700fe1fea28387ad5c89 (I am pointing you to the Git Wiki, so that you can find more pointers should you not be happy with this one.) Ciao, Dscho ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 11:11 ` Johannes Schindelin @ 2007-09-21 11:41 ` Peter Stahlir 2007-09-21 12:53 ` Karl Hasselström 2007-09-21 13:22 ` Nicolas Pitre 2007-09-21 14:22 ` Miklos Vajna 1 sibling, 2 replies; 27+ messages in thread From: Peter Stahlir @ 2007-09-21 11:41 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git 2007/9/21, Johannes Schindelin <Johannes.Schindelin@gmx.de>: > On Fri, 21 Sep 2007, Peter Stahlir wrote: > > > Is it possible/feasible to use git as a filesystem? > > Like having git on top of ext3. > > I haven't looked at it closely, but there is a GitFS: > > http://git.or.cz/gitwiki/InterfacesFrontendsAndTools#head-f354b40618742b976c13700fe1fea28387ad5c89 > > (I am pointing you to the Git Wiki, so that you can find more pointers > should you not be happy with this one.) Thank you. This is was I was looking for. My motivation is whether it is possible to run a system, for example Debian on a computer on top of gitfs, and then have a huge mirror on it, for example a complete 252GB Debian mirror as space efficient as possible. I wonder how big a deltified Debian mirror in one pack file would be. :) Peter ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 11:41 ` Peter Stahlir @ 2007-09-21 12:53 ` Karl Hasselström 2007-09-21 13:28 ` Peter Stahlir 2007-09-21 13:22 ` Nicolas Pitre 1 sibling, 1 reply; 27+ messages in thread From: Karl Hasselström @ 2007-09-21 12:53 UTC (permalink / raw) To: Peter Stahlir; +Cc: Johannes Schindelin, git On 2007-09-21 13:41:07 +0200, Peter Stahlir wrote: > My motivation is whether it is possible to run a system, for example > Debian on a computer on top of gitfs, and then have a huge mirror on > it, for example a complete 252GB Debian mirror as space efficient as > possible. > > I wonder how big a deltified Debian mirror in one pack file would > be. :) Very, very close to 252 GB, since .deb files are already compressed. If it's just the gzip compression you want, surely there must be real filesystems that can do that. -- Karl Hasselström, kha@treskal.com www.treskal.com/kalle ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 12:53 ` Karl Hasselström @ 2007-09-21 13:28 ` Peter Stahlir 2007-09-21 13:41 ` Michael Poole ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Peter Stahlir @ 2007-09-21 13:28 UTC (permalink / raw) To: Karl Hasselström; +Cc: Johannes Schindelin, git > > I wonder how big a deltified Debian mirror in one pack file would > > be. :) > > Very, very close to 252 GB, since .deb files are already compressed. Yes, but if there were deb and tar support in git (to automatically unpack archives and store the contents), together with the best available binary diffs I think the repository could be significantly smaller because files common to all architectures could be deltified, I did a quick check with 100MB of deb archives; the result was nearly 100MB as you said. I also did a quick check with all .so files in my /usr/lib directory; it shrunk from 50MB to 20MB, the same is achieved with tar + bz2. But the thing is, I think there is a lot of redundancy in a) a Debian mirror or b) your disk at home. Telling git to handle -for example- deb archives and storing everything in a pack file would take advantage of redundancy across _all_ files. So the /usr/share/doc of all architectures could be compressed. Right? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 13:28 ` Peter Stahlir @ 2007-09-21 13:41 ` Michael Poole 2007-09-21 14:38 ` jlh 2007-09-21 17:29 ` Dmitry Potapov 2 siblings, 0 replies; 27+ messages in thread From: Michael Poole @ 2007-09-21 13:41 UTC (permalink / raw) To: Peter Stahlir; +Cc: Karl Hasselström, Johannes Schindelin, git Peter Stahlir writes: > Telling git to handle -for example- deb archives and storing > everything in a pack file would take advantage of redundancy across > _all_ files. > So the /usr/share/doc of all architectures could be compressed. > > Right? You're proposing to trade off lots of CPU time in fetching many files from a pack and making the package file -- paid every time someone requests a package -- for at most 250 GB of space (cf Amdahl's law). How long are your users willing to wait in exchange for 250 GB of saved space? How much CPU are you willing to spend for it? Compare those to the cost of a 300 GB hard drive (roughly $65). There's also the cost to make git support the package format, and to maintain that code going forward. Those costs are also large. Michael Poole ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 13:28 ` Peter Stahlir 2007-09-21 13:41 ` Michael Poole @ 2007-09-21 14:38 ` jlh 2007-09-21 17:29 ` Dmitry Potapov 2 siblings, 0 replies; 27+ messages in thread From: jlh @ 2007-09-21 14:38 UTC (permalink / raw) To: Peter Stahlir; +Cc: git Peter Stahlir wrote: > But the thing is, I think there is a lot of redundancy in > a) a Debian mirror or Yes, surely. Your idea suggests that you want any file to be reconstructed on-the-fly whenever it's being requested. Isn't there the danger of killing performance, the CPU being the bottleneck? I imagine such a debian mirror has quite some traffic. > b) your disk at home. I doubt so. There sure is lots of redundancy within each file and that's what compressed file systems are good for. But what you talk about is redundancy across (unversioned) files, and I don't feel there is a lot of it. Yes, I might have a few copies of the file COPYING on my disk, and maybe some of my sources share a few functions, but this won't save me tons of space. All my binaries, libraries, MP3s, videos, config files, etc don't really have any redundancy across file boundaries. And even if there is, finding that redundancy is an O(whatever-but-not-n) operation that would be rather slow. I definitely see gitfs (or similar ideas) as potentially being useful in some cases (maybe debian mirrors could be one), but not for my disk at home, which I generally would prefer to be faster than more compressed. jlh ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 13:28 ` Peter Stahlir 2007-09-21 13:41 ` Michael Poole 2007-09-21 14:38 ` jlh @ 2007-09-21 17:29 ` Dmitry Potapov 2007-09-21 23:56 ` Martin Langhoff 2 siblings, 1 reply; 27+ messages in thread From: Dmitry Potapov @ 2007-09-21 17:29 UTC (permalink / raw) To: Peter Stahlir; +Cc: Karl Hasselström, Johannes Schindelin, git On Fri, Sep 21, 2007 at 03:28:20PM +0200, Peter Stahlir wrote: > Yes, but if there were deb and tar support in git (to automatically unpack > archives and store the contents), together with the best available > binary diffs I think the repository could be significantly smaller because > files common to all architectures could be deltified, You can unpack contain of gzipped or bzipped files and deltify it, but you cannot restore exactly the same gzip or bzip file based on its content unless you use exactly the same version of compressor that was used to create the original file. So, if you put any .deb file in such a system, you will get back a different .deb file (with a different SHA1). So, aside high CPU and memory requirements, this system cannot work in principle unless all users have exactly the same version of a compressor. Dmitry ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 17:29 ` Dmitry Potapov @ 2007-09-21 23:56 ` Martin Langhoff 2007-09-22 3:09 ` Sam Vilain 0 siblings, 1 reply; 27+ messages in thread From: Martin Langhoff @ 2007-09-21 23:56 UTC (permalink / raw) To: Dmitry Potapov Cc: Peter Stahlir, Karl Hasselström, Johannes Schindelin, git On 9/22/07, Dmitry Potapov <dpotapov@gmail.com> wrote: > used to create the original file. So, if you put any .deb file in such > a system, you will get back a different .deb file (with a different SHA1). > So, aside high CPU and memory requirements, this system cannot work in > principle unless all users have exactly the same version of a compressor. Was thinking the same - compression machinery, ordering of the files, everything. It'd be a nightmare to ensure you get back the same .deb, without a single different bit. Debian packaging toolchain could be reworked to use a more GIT-like approach - off the top of my head, at least - signing/validating the "tree" of the package rather than the completed package could allow the savings in distribution you mention, decouple the signing from the compression, and simplify things like debdiff - git or git-like strategies for source packages cheers, m ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 23:56 ` Martin Langhoff @ 2007-09-22 3:09 ` Sam Vilain 0 siblings, 0 replies; 27+ messages in thread From: Sam Vilain @ 2007-09-22 3:09 UTC (permalink / raw) To: Martin Langhoff Cc: Dmitry Potapov, Peter Stahlir, Karl Hasselström, Johannes Schindelin, git Martin Langhoff wrote: > On 9/22/07, Dmitry Potapov <dpotapov@gmail.com> wrote: > >> used to create the original file. So, if you put any .deb file in such >> a system, you will get back a different .deb file (with a different SHA1). >> So, aside high CPU and memory requirements, this system cannot work in >> principle unless all users have exactly the same version of a compressor. >> > > Was thinking the same - compression machinery, ordering of the files, > everything. It'd be a nightmare to ensure you get back the same .deb, > without a single different bit. > > Debian packaging toolchain could be reworked to use a more GIT-like > approach - off the top of my head, at least > > - signing/validating the "tree" of the package rather than the > completed package could allow the savings in distribution you mention, > decouple the signing from the compression, and simplify things like > debdiff > > - git or git-like strategies for source packages > Nightmare indeed. I actually wrote a proof of concept for this idea for gzip. http://git.catalyst.net.nz/gw?p=git.git;a=shortlog;h=archive-blobs (see also http://planet.catalyst.net.nz/blog/2006/07/17/samv/xteddy_caught_consuming_rampant_amounts_of_disk_space) I usually warn people that this undertaking is "slightly insane". My implementation was designed to be called like "git-hash-object". What it did was look at the input stream, and detect quickly whether it looked like a gzip stream. If it was, it would decompress it and then try to compress the first few blocks using different compression libraries and settings to determine what settings were used. If it could find the right settings for the first meg or so, then it would bank on the rest being identical as well, record which compressor and what settings were used and write the uncompressed object, as well as the information needed to reconstruct the gzip header, to a new type of object called an "archive" object. If the stream could not be reproduced then it would save the raw stream instead. For something like a Debian archive, it is very likely that all compressed streams will be reproducible, because they will almost all be compressed using the same implementation of gzip. For tar and .ar files, this can be slightly more deterministic of course. It doesn't even need to be particularly savvy of what all the fields are - just locate the files in the .tar, write out a tree, and then write a TOC that lists tree entries and contains any extra data (ie headers, etc). In hindsight, making a new object type was probably a mistake. If I were to re-undertake this I would not go down that path, though I'd certainly consider using tag objects for the extra data, and throwing them in the tree like submodules. It would also be essential in a "real" solution to bundle reference copies of the zlib and gzip compressors (yes, their output streams differ with longer inputs and even some short ones). Sam. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 11:41 ` Peter Stahlir 2007-09-21 12:53 ` Karl Hasselström @ 2007-09-21 13:22 ` Nicolas Pitre 2007-09-21 13:35 ` Peter Stahlir 2007-09-21 23:33 ` Eric Wong 1 sibling, 2 replies; 27+ messages in thread From: Nicolas Pitre @ 2007-09-21 13:22 UTC (permalink / raw) To: Peter Stahlir; +Cc: Johannes Schindelin, git On Fri, 21 Sep 2007, Peter Stahlir wrote: > This is was I was looking for. My motivation is whether it is possible > to run a system, for example Debian on a computer on top of gitfs, > and then have a huge mirror on it, for example a complete 252GB > Debian mirror as space efficient as possible. > > I wonder how big a deltified Debian mirror in one pack file would be. :) It would be just as big as the non gitified storage on disk. The space saving with git comes from efficient delta storage of _versioned_ files, i.e. multiple nearly identical versions of the same file where the stored delta is only the small difference between the first full version and subsequent versions. Unless you plan on storing many different Debian versions together, you won't benefit from any delta at all. And since Debian packages are already compressed, git won't be able to compress them further. So don't waste your time. Nicolas ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 13:22 ` Nicolas Pitre @ 2007-09-21 13:35 ` Peter Stahlir 2007-09-21 13:45 ` Nicolas Pitre 2007-09-21 15:46 ` Christian von Kietzell 2007-09-21 23:33 ` Eric Wong 1 sibling, 2 replies; 27+ messages in thread From: Peter Stahlir @ 2007-09-21 13:35 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Johannes Schindelin, git > > I wonder how big a deltified Debian mirror in one pack file would be. :) > > It would be just as big as the non gitified storage on disk. > > The space saving with git comes from efficient delta storage of > _versioned_ files, i.e. multiple nearly identical versions of the same > file where the stored delta is only the small difference between the > first full version and subsequent versions. Unless you plan on storing > many different Debian versions together, you won't benefit from any > delta at all. And since Debian packages are already compressed, git > won't be able to compress them further. > > So don't waste your time. The 252GB stem from the fact that there are more than 10 architectures. I guess the /usr/share/doc of all architectures could be deltified (as could be all files that are architecture-independent) Right? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 13:35 ` Peter Stahlir @ 2007-09-21 13:45 ` Nicolas Pitre 2007-09-21 15:46 ` Christian von Kietzell 1 sibling, 0 replies; 27+ messages in thread From: Nicolas Pitre @ 2007-09-21 13:45 UTC (permalink / raw) To: Peter Stahlir; +Cc: Johannes Schindelin, git On Fri, 21 Sep 2007, Peter Stahlir wrote: > > > I wonder how big a deltified Debian mirror in one pack file would be. :) > > > > It would be just as big as the non gitified storage on disk. > > > > The space saving with git comes from efficient delta storage of > > _versioned_ files, i.e. multiple nearly identical versions of the same > > file where the stored delta is only the small difference between the > > first full version and subsequent versions. Unless you plan on storing > > many different Debian versions together, you won't benefit from any > > delta at all. And since Debian packages are already compressed, git > > won't be able to compress them further. > > > > So don't waste your time. > > The 252GB stem from the fact that there are more than 10 architectures. > I guess the /usr/share/doc of all architectures could be deltified (as could > be all files that are architecture-independent) > > Right? Indeed. But how much does this represents, once compressed, compared to the rest? I doubt it is significant enough for the trouble. Nicolas ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 13:35 ` Peter Stahlir 2007-09-21 13:45 ` Nicolas Pitre @ 2007-09-21 15:46 ` Christian von Kietzell 1 sibling, 0 replies; 27+ messages in thread From: Christian von Kietzell @ 2007-09-21 15:46 UTC (permalink / raw) To: Peter Stahlir; +Cc: Nicolas Pitre, Johannes Schindelin, git Am Freitag, den 21.09.2007, 15:35 +0200 schrieb Peter Stahlir: > > > I wonder how big a deltified Debian mirror in one pack file would be. :) > > > > It would be just as big as the non gitified storage on disk. > > > > The space saving with git comes from efficient delta storage of > > _versioned_ files, i.e. multiple nearly identical versions of the same > > file where the stored delta is only the small difference between the > > first full version and subsequent versions. Unless you plan on storing > > many different Debian versions together, you won't benefit from any > > delta at all. And since Debian packages are already compressed, git > > won't be able to compress them further. > > > > So don't waste your time. > > The 252GB stem from the fact that there are more than 10 architectures. > I guess the /usr/share/doc of all architectures could be deltified (as could > be all files that are architecture-independent) > > Right? I don't think so. Architecture-independent files are usually separated out into separate packages (think of the -doc and -data packages) that get architecture "all" and land in the Debian archive only once. So you probably won't save too much there. Chris ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 13:22 ` Nicolas Pitre 2007-09-21 13:35 ` Peter Stahlir @ 2007-09-21 23:33 ` Eric Wong 2007-09-21 23:42 ` Johannes Schindelin 1 sibling, 1 reply; 27+ messages in thread From: Eric Wong @ 2007-09-21 23:33 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Peter Stahlir, Johannes Schindelin, git Nicolas Pitre <nico@cam.org> wrote: > On Fri, 21 Sep 2007, Peter Stahlir wrote: > > > This is was I was looking for. My motivation is whether it is possible > > to run a system, for example Debian on a computer on top of gitfs, > > and then have a huge mirror on it, for example a complete 252GB > > Debian mirror as space efficient as possible. > > > > I wonder how big a deltified Debian mirror in one pack file would be. :) > > It would be just as big as the non gitified storage on disk. > > The space saving with git comes from efficient delta storage of > _versioned_ files, i.e. multiple nearly identical versions of the same > file where the stored delta is only the small difference between the > first full version and subsequent versions. Unless you plan on storing > many different Debian versions together, you won't benefit from any > delta at all. And since Debian packages are already compressed, git > won't be able to compress them further. > > So don't waste your time. On a similar note, has anybody experimented with using git to store maildirs or news spools? I'd imagine the quoted portions of most message threads could be delta-compressed quite efficiently. -- Eric Wong ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 23:33 ` Eric Wong @ 2007-09-21 23:42 ` Johannes Schindelin 2007-09-22 2:06 ` Eric Wong 0 siblings, 1 reply; 27+ messages in thread From: Johannes Schindelin @ 2007-09-21 23:42 UTC (permalink / raw) To: Eric Wong; +Cc: Nicolas Pitre, Peter Stahlir, git Hi, On Fri, 21 Sep 2007, Eric Wong wrote: > Nicolas Pitre <nico@cam.org> wrote: > > On Fri, 21 Sep 2007, Peter Stahlir wrote: > > > > > This is was I was looking for. My motivation is whether it is possible > > > to run a system, for example Debian on a computer on top of gitfs, > > > and then have a huge mirror on it, for example a complete 252GB > > > Debian mirror as space efficient as possible. > > > > > > I wonder how big a deltified Debian mirror in one pack file would be. :) > > > > It would be just as big as the non gitified storage on disk. > > > > The space saving with git comes from efficient delta storage of > > _versioned_ files, i.e. multiple nearly identical versions of the same > > file where the stored delta is only the small difference between the > > first full version and subsequent versions. Unless you plan on storing > > many different Debian versions together, you won't benefit from any > > delta at all. And since Debian packages are already compressed, git > > won't be able to compress them further. > > > > So don't waste your time. > > On a similar note, has anybody experimented with using git to > store maildirs or news spools? I'd imagine the quoted portions of > most message threads could be delta-compressed quite efficiently. I store all my mail in a git repository. Works beautifully. Except that the buffers on my laptop are constantly full :-( So a simple commit takes some waiting. Should be no issue on normal (desktop) machines. Ciao, Dscho ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 23:42 ` Johannes Schindelin @ 2007-09-22 2:06 ` Eric Wong 2007-09-22 12:06 ` Johannes Schindelin 0 siblings, 1 reply; 27+ messages in thread From: Eric Wong @ 2007-09-22 2:06 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Nicolas Pitre, Peter Stahlir, git Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > Hi, Hi, > On Fri, 21 Sep 2007, Eric Wong wrote: > > > > On a similar note, has anybody experimented with using git to > > store maildirs or news spools? I'd imagine the quoted portions of > > most message threads could be delta-compressed quite efficiently. > > I store all my mail in a git repository. Works beautifully. Except that > the buffers on my laptop are constantly full :-( So a simple commit takes > some waiting. > > Should be no issue on normal (desktop) machines. D'oh. I already have maildir performance problems on my laptop. I wonder how well only having an index and no commits (no versioning), and manual packing with pack-objects would work. Packing could be optimized to order objects based on the Message-Id, References, and In-Reply-To headers, too. -- Eric Wong ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-22 2:06 ` Eric Wong @ 2007-09-22 12:06 ` Johannes Schindelin 0 siblings, 0 replies; 27+ messages in thread From: Johannes Schindelin @ 2007-09-22 12:06 UTC (permalink / raw) To: Eric Wong; +Cc: Nicolas Pitre, Peter Stahlir, git Hi, On Fri, 21 Sep 2007, Eric Wong wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > > On Fri, 21 Sep 2007, Eric Wong wrote: > > > > > > On a similar note, has anybody experimented with using git to store > > > maildirs or news spools? I'd imagine the quoted portions of most > > > message threads could be delta-compressed quite efficiently. > > > > I store all my mail in a git repository. Works beautifully. Except > > that the buffers on my laptop are constantly full :-( So a simple > > commit takes some waiting. > > > > Should be no issue on normal (desktop) machines. > > D'oh. I already have maildir performance problems on my laptop. Umm. Regular operation is not affected, since I (add and) commit only when I weeded out all those spams and other unwanted mail. > I wonder how well only having an index and no commits (no versioning), > and manual packing with pack-objects would work. Packing could be > optimized to order objects based on the Message-Id, References, and > In-Reply-To headers, too. The most efficient way would be to have a mailer backend accessing the database, and then not have a working directory, methinks (especially with these amounts of mail I am juggling ATM). Time forbids working on this, though. Ciao, Dscho ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Git as a filesystem 2007-09-21 11:11 ` Johannes Schindelin 2007-09-21 11:41 ` Peter Stahlir @ 2007-09-21 14:22 ` Miklos Vajna 1 sibling, 0 replies; 27+ messages in thread From: Miklos Vajna @ 2007-09-21 14:22 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Peter Stahlir, git [-- Attachment #1: Type: text/plain, Size: 480 bytes --] On Fri, Sep 21, 2007 at 12:11:41PM +0100, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > I haven't looked at it closely, but there is a GitFS: > > http://git.or.cz/gitwiki/InterfacesFrontendsAndTools#head-f354b40618742b976c13700fe1fea28387ad5c89 > > (I am pointing you to the Git Wiki, so that you can find more pointers > should you not be happy with this one.) fyi, last time i had a look at it, it did not compile with git 1.5.2.x thanks, - VMiklos [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2007-09-22 12:07 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-09-21 10:48 Git as a filesystem Peter Stahlir 2007-09-21 11:28 ` Muhammad Tayyab 2007-09-21 11:37 ` Peter Stahlir 2007-09-21 11:46 ` Peter Stahlir 2007-09-21 12:51 ` Jan Engelhardt 2007-09-21 13:30 ` Peter Stahlir 2007-09-21 13:53 ` Jan Engelhardt 2007-09-21 15:04 ` Adrian Bunk -- strict thread matches above, loose matches on Subject: below -- 2007-09-21 10:51 Peter Stahlir 2007-09-21 11:11 ` Johannes Schindelin 2007-09-21 11:41 ` Peter Stahlir 2007-09-21 12:53 ` Karl Hasselström 2007-09-21 13:28 ` Peter Stahlir 2007-09-21 13:41 ` Michael Poole 2007-09-21 14:38 ` jlh 2007-09-21 17:29 ` Dmitry Potapov 2007-09-21 23:56 ` Martin Langhoff 2007-09-22 3:09 ` Sam Vilain 2007-09-21 13:22 ` Nicolas Pitre 2007-09-21 13:35 ` Peter Stahlir 2007-09-21 13:45 ` Nicolas Pitre 2007-09-21 15:46 ` Christian von Kietzell 2007-09-21 23:33 ` Eric Wong 2007-09-21 23:42 ` Johannes Schindelin 2007-09-22 2:06 ` Eric Wong 2007-09-22 12:06 ` Johannes Schindelin 2007-09-21 14:22 ` Miklos Vajna
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.