* many files, simple history @ 2010-05-14 2:57 Ali Tofigh 2010-05-14 3:27 ` Michael Witten 2010-05-14 4:05 ` Jeff King 0 siblings, 2 replies; 5+ messages in thread From: Ali Tofigh @ 2010-05-14 2:57 UTC (permalink / raw) To: git short version: will git handle large number of files efficiently if the history is simple and linear, i.e., without merges? longer version: i'm considering using git to keep track of my installed user programs/files on my linux machine, mainly because i want to be able to uninstall software cleanly and completely (i almost always build from source code and install in non-standard locations). so i would want to use git to keep track of every program i install or uninstall. this way, i could go back and uninstall a program simply by finding the commit when it was installed, get the list of files that were added as a result, and remove them (and of course, commit the removals into git so i can always undo the uninstall too!) so the history stored in git will be linear and consist of file additions and removals. but this will potentially involve many files. will git be able to handle this (as yet hypothetical) situation efficiently? /ali ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: many files, simple history 2010-05-14 2:57 many files, simple history Ali Tofigh @ 2010-05-14 3:27 ` Michael Witten 2010-05-14 16:25 ` Ali Tofigh 2010-05-14 4:05 ` Jeff King 1 sibling, 1 reply; 5+ messages in thread From: Michael Witten @ 2010-05-14 3:27 UTC (permalink / raw) To: Ali Tofigh; +Cc: git On Thu, May 13, 2010 at 21:57, Ali Tofigh <alix.tofigh@gmail.com> wrote: > (i almost > always build from source code and install in non-standard locations). It would make a lot more sense to use a package manager. You might find that Arch Linux's `pacman' provides everything that you need; it's really easy to define your own packages and build from source, allowing pacman to manage dependencies, locate which package owns which file, and upgrade/remove packages. I bet gentoo's `emerge' would also be viable. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: many files, simple history 2010-05-14 3:27 ` Michael Witten @ 2010-05-14 16:25 ` Ali Tofigh 0 siblings, 0 replies; 5+ messages in thread From: Ali Tofigh @ 2010-05-14 16:25 UTC (permalink / raw) To: Michael Witten; +Cc: git On Thu, May 13, 2010 at 23:27, Michael Witten <mfwitten@gmail.com> wrote: > On Thu, May 13, 2010 at 21:57, Ali Tofigh <alix.tofigh@gmail.com> wrote: >> (i almost >> always build from source code and install in non-standard locations). > > It would make a lot more sense to use a package manager. you're probably right, but if i want to keep up with all my everyday obligations, then i have to chose my battles carefully. for many tasks, there are specialized tools that can be used, but these always come at the expense of having to learn them. and if those tasks aren't performed on a daily basis, then i don't stand a chance at remembering what i've learned and i have to relearn them over and over again. /ali > > You might find that Arch Linux's `pacman' provides everything that you > need; it's really easy to define your own packages and build from > source, allowing pacman to manage dependencies, locate which package > owns which file, and upgrade/remove packages. > > I bet gentoo's `emerge' would also be viable. > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: many files, simple history 2010-05-14 2:57 many files, simple history Ali Tofigh 2010-05-14 3:27 ` Michael Witten @ 2010-05-14 4:05 ` Jeff King 2010-05-14 16:18 ` Ali Tofigh 1 sibling, 1 reply; 5+ messages in thread From: Jeff King @ 2010-05-14 4:05 UTC (permalink / raw) To: Ali Tofigh; +Cc: git On Thu, May 13, 2010 at 10:57:22PM -0400, Ali Tofigh wrote: > short version: will git handle large number of files efficiently if > the history is simple and linear, i.e., without merges? Short answer: large number of files, yes, large files, not really. The shape of history is largely irrelevant. Longer answer: Git separates the conceptual structure of history (the digraph of commits, and the pointers of commits to trees to blobs) from the actual storage of objects representing that history. Problems with large files are usually storage issues. Copying them around in packfiles is expensive, storing an extra copy in the repo is expensive, trying deltas and diffs is expensive. None of those things has to do with the shape of your history. So I would expect git to handle such a load with a linear history about as well as a complex history with merges. For large numbers of files, git generally does a good job, especially if those files are distributed throughout a directory hierarchy. But keep in mind that the git repo will store another copy of every file. They will be delta-compressed between versions, and zlib compressed overall, but you may potentially be doubling the amount of disk space required if you have a lot of uncompressible binary files. For large files, git expects to be able to pull each file into memory. Sometimes two versions if you are doing a diff. And it will copy those files around when repacking (which you will want to do for the sake of the smaller files). So files on the order of a few megabytes are not a problem. If you have files in the hundreds of megabytes or gigabytes, expect some operations to be slow (like repacking). Really, I would start by just "git add"-ing your whole filesystem, doing a "git repack -ad", and seeing how long it takes, and what the resulting size is. -Peff ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: many files, simple history 2010-05-14 4:05 ` Jeff King @ 2010-05-14 16:18 ` Ali Tofigh 0 siblings, 0 replies; 5+ messages in thread From: Ali Tofigh @ 2010-05-14 16:18 UTC (permalink / raw) To: Jeff King; +Cc: git On Fri, May 14, 2010 at 00:05, Jeff King <peff@peff.net> wrote: > On Thu, May 13, 2010 at 10:57:22PM -0400, Ali Tofigh wrote: > >> short version: will git handle large number of files efficiently if >> the history is simple and linear, i.e., without merges? > > Short answer: large number of files, yes, large files, not really. The > shape of history is largely irrelevant. thank you for the explanation. I will start using git for managing my installed programs and will try to report back to this list about my experience. /ali > > Longer answer: > > Git separates the conceptual structure of history (the digraph of > commits, and the pointers of commits to trees to blobs) from the actual > storage of objects representing that history. Problems with large files > are usually storage issues. Copying them around in packfiles is > expensive, storing an extra copy in the repo is expensive, trying deltas > and diffs is expensive. None of those things has to do with the shape of > your history. So I would expect git to handle such a load with a linear > history about as well as a complex history with merges. > > For large numbers of files, git generally does a good job, especially if > those files are distributed throughout a directory hierarchy. But keep > in mind that the git repo will store another copy of every file. They > will be delta-compressed between versions, and zlib compressed overall, > but you may potentially be doubling the amount of disk space required if > you have a lot of uncompressible binary files. > > For large files, git expects to be able to pull each file into memory. > Sometimes two versions if you are doing a diff. And it will copy those > files around when repacking (which you will want to do for the sake of > the smaller files). So files on the order of a few megabytes are not a > problem. If you have files in the hundreds of megabytes or gigabytes, > expect some operations to be slow (like repacking). > > Really, I would start by just "git add"-ing your whole filesystem, doing > a "git repack -ad", and seeing how long it takes, and what the resulting > size is. > > -Peff > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2010-05-14 16:26 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-05-14 2:57 many files, simple history Ali Tofigh 2010-05-14 3:27 ` Michael Witten 2010-05-14 16:25 ` Ali Tofigh 2010-05-14 4:05 ` Jeff King 2010-05-14 16:18 ` Ali Tofigh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).