git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* many files, simple history
@ 2010-05-14  2:57 Ali Tofigh
  2010-05-14  3:27 ` Michael Witten
  2010-05-14  4:05 ` Jeff King
  0 siblings, 2 replies; 5+ messages in thread
From: Ali Tofigh @ 2010-05-14  2:57 UTC (permalink / raw)
  To: git

short version: will git handle large number of files efficiently if
the history is simple and linear, i.e., without merges?

longer version: i'm considering using git to keep track of my
installed user programs/files on my linux machine, mainly because i
want to be able to uninstall software cleanly and completely (i almost
always build from source code and install in non-standard locations).
so i would want to use git to keep track of every program i install or
uninstall. this way, i could go back and uninstall a program simply by
finding the commit when it was installed, get the list of files that
were added as a result, and remove them (and of course, commit the
removals into git so i can always undo the uninstall too!)

so the history stored in git will be linear and consist of file
additions and removals. but this will potentially involve many files.
will git be able to handle this (as yet hypothetical) situation
efficiently?

/ali

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: many files, simple history
  2010-05-14  2:57 many files, simple history Ali Tofigh
@ 2010-05-14  3:27 ` Michael Witten
  2010-05-14 16:25   ` Ali Tofigh
  2010-05-14  4:05 ` Jeff King
  1 sibling, 1 reply; 5+ messages in thread
From: Michael Witten @ 2010-05-14  3:27 UTC (permalink / raw)
  To: Ali Tofigh; +Cc: git

On Thu, May 13, 2010 at 21:57, Ali Tofigh <alix.tofigh@gmail.com> wrote:
> (i almost
> always build from source code and install in non-standard locations).

It would make a lot more sense to use a package manager.

You might find that Arch Linux's `pacman' provides everything that you
need; it's really easy to define your own packages and build from
source, allowing pacman to manage dependencies, locate which package
owns which file, and upgrade/remove packages.

I bet gentoo's `emerge' would also be viable.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: many files, simple history
  2010-05-14  2:57 many files, simple history Ali Tofigh
  2010-05-14  3:27 ` Michael Witten
@ 2010-05-14  4:05 ` Jeff King
  2010-05-14 16:18   ` Ali Tofigh
  1 sibling, 1 reply; 5+ messages in thread
From: Jeff King @ 2010-05-14  4:05 UTC (permalink / raw)
  To: Ali Tofigh; +Cc: git

On Thu, May 13, 2010 at 10:57:22PM -0400, Ali Tofigh wrote:

> short version: will git handle large number of files efficiently if
> the history is simple and linear, i.e., without merges?

Short answer: large number of files, yes, large files, not really. The
shape of history is largely irrelevant.

Longer answer:

Git separates the conceptual structure of history (the digraph of
commits, and the pointers of commits to trees to blobs) from the actual
storage of objects representing that history. Problems with large files
are usually storage issues. Copying them around in packfiles is
expensive, storing an extra copy in the repo is expensive, trying deltas
and diffs is expensive. None of those things has to do with the shape of
your history. So I would expect git to handle such a load with a linear
history about as well as a complex history with merges.

For large numbers of files, git generally does a good job, especially if
those files are distributed throughout a directory hierarchy. But keep
in mind that the git repo will store another copy of every file. They
will be delta-compressed between versions, and zlib compressed overall,
but you may potentially be doubling the amount of disk space required if
you have a lot of uncompressible binary files.

For large files, git expects to be able to pull each file into memory.
Sometimes two versions if you are doing a diff. And it will copy those
files around when repacking (which you will want to do for the sake of
the smaller files). So files on the order of a few megabytes are not a
problem. If you have files in the hundreds of megabytes or gigabytes,
expect some operations to be slow (like repacking).

Really, I would start by just "git add"-ing your whole filesystem, doing
a "git repack -ad", and seeing how long it takes, and what the resulting
size is.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: many files, simple history
  2010-05-14  4:05 ` Jeff King
@ 2010-05-14 16:18   ` Ali Tofigh
  0 siblings, 0 replies; 5+ messages in thread
From: Ali Tofigh @ 2010-05-14 16:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Fri, May 14, 2010 at 00:05, Jeff King <peff@peff.net> wrote:
> On Thu, May 13, 2010 at 10:57:22PM -0400, Ali Tofigh wrote:
>
>> short version: will git handle large number of files efficiently if
>> the history is simple and linear, i.e., without merges?
>
> Short answer: large number of files, yes, large files, not really. The
> shape of history is largely irrelevant.

thank you for the explanation. I will start using git for managing my
installed programs and will try to  report back to this list about my
experience.

/ali

>
> Longer answer:
>
> Git separates the conceptual structure of history (the digraph of
> commits, and the pointers of commits to trees to blobs) from the actual
> storage of objects representing that history. Problems with large files
> are usually storage issues. Copying them around in packfiles is
> expensive, storing an extra copy in the repo is expensive, trying deltas
> and diffs is expensive. None of those things has to do with the shape of
> your history. So I would expect git to handle such a load with a linear
> history about as well as a complex history with merges.
>
> For large numbers of files, git generally does a good job, especially if
> those files are distributed throughout a directory hierarchy. But keep
> in mind that the git repo will store another copy of every file. They
> will be delta-compressed between versions, and zlib compressed overall,
> but you may potentially be doubling the amount of disk space required if
> you have a lot of uncompressible binary files.
>
> For large files, git expects to be able to pull each file into memory.
> Sometimes two versions if you are doing a diff. And it will copy those
> files around when repacking (which you will want to do for the sake of
> the smaller files). So files on the order of a few megabytes are not a
> problem. If you have files in the hundreds of megabytes or gigabytes,
> expect some operations to be slow (like repacking).
>
> Really, I would start by just "git add"-ing your whole filesystem, doing
> a "git repack -ad", and seeing how long it takes, and what the resulting
> size is.
>
> -Peff
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: many files, simple history
  2010-05-14  3:27 ` Michael Witten
@ 2010-05-14 16:25   ` Ali Tofigh
  0 siblings, 0 replies; 5+ messages in thread
From: Ali Tofigh @ 2010-05-14 16:25 UTC (permalink / raw)
  To: Michael Witten; +Cc: git

On Thu, May 13, 2010 at 23:27, Michael Witten <mfwitten@gmail.com> wrote:
> On Thu, May 13, 2010 at 21:57, Ali Tofigh <alix.tofigh@gmail.com> wrote:
>> (i almost
>> always build from source code and install in non-standard locations).
>
> It would make a lot more sense to use a package manager.

you're probably right, but if i want to keep up with all my everyday
obligations, then i have to chose my battles carefully. for many
tasks, there are specialized tools that can be used, but these always
come at the expense of having to learn them. and if those tasks aren't
performed on a daily basis, then i don't stand a chance at remembering
what i've learned and i have to relearn them over and over again.

/ali

>
> You might find that Arch Linux's `pacman' provides everything that you
> need; it's really easy to define your own packages and build from
> source, allowing pacman to manage dependencies, locate which package
> owns which file, and upgrade/remove packages.
>
> I bet gentoo's `emerge' would also be viable.
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-05-14 16:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-14  2:57 many files, simple history Ali Tofigh
2010-05-14  3:27 ` Michael Witten
2010-05-14 16:25   ` Ali Tofigh
2010-05-14  4:05 ` Jeff King
2010-05-14 16:18   ` Ali Tofigh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).