From: Nicolas Pitre <nico@cam.org>
To: Adam Heath <doogie@brainfood.com>
Cc: git@vger.kernel.org
Subject: Re: large(25G) repository in git
Date: Tue, 24 Mar 2009 14:31:31 -0400 (EDT) [thread overview]
Message-ID: <alpine.LFD.2.00.0903241404080.26337@xanadu.home> (raw)
In-Reply-To: <49C91F87.3050105@brainfood.com>
On Tue, 24 Mar 2009, Adam Heath wrote:
> Nicolas Pitre wrote:
>
> > Strange. You could instruct ssh to keep the connection up with the
> > ServerAliveInterval option (see the ssh_config man page).
>
> Sure, could do that. Already have a separate ssh config entry for
> this host. But why should a connection be kept open for that long?
> Why not close and re-open?
Because it is way more complex for git to do that than for ssh to keep
the connection alive. And normally there is no need as git is supposed
to be faster than that.
> Consider the case of other protocol access. http/git/ssh. Should
> they *all* be changed to allow for this? Wouldn't it be simpler to
> just make git smarter?
Making git faster is the solution, not working around the issue.
> >> So, to work around that, I ran git gc. When done, I discovered that
> >> git repacked the *entire* repository. While not something I care for,
> >> I can understand that, and live with it. It just took *hours* to do so.
> >>
> >> Then, what really annoys me, is that when I finally did the push, it
> >> tried sending the single 27G pack file, when the remote already had
> >> 25G of the repository in several different packs(the site was an
> >> hg->git conversion). This part is just unacceptable.
> >
> > This shouldn't happen either. When pushing, git reconstruct a pack with
> > only the necessary objects to transmit. Are you sure it was really
> > trying to send a 27G pack?
>
> Of course I'm sure. I wouldn't have sent the email if it didn't
> happen. And, I have the bandwidthd graph and lost time to prove it.
As much as I would like to believe you, this doesn't help fixing the
problem if you don't provide more information about this. For example,
the output from git during the whole operation might give us the
beginning of a clue. Otherwise, all I can tell you is that such thing
is not supposed to happen.
> After I ran git push, ssh timed out, the temp pack that was created
> was then removed, as git complained about the connection being gone.
On a push, there is no creation of a temp pack. It is always produced
on the fly and pushed straight via the ssh connection.
> I then decided to do a 'git gc', which collapsed all the separate
> packs into one. This allowed git push to proceed quickly, but at that
> point, it started sending the entire pack.
If this was really the case, then this is definitely a bug. Please take
a snapshot of your screen with git messages if this ever happens again.
> It's entirely possible that the temp pack created by git push was
> incremental; it just took too long to create it, so it got aborted.
The push operation has multiple phases. You should see "counting
objects", "compressing objects" and "writing objects". Could you give
us an approximation of how long each of those phases took?
> But, doing git gc shouldn't cause things to be resent.
Indeed.
> The machines in question have done push before. Even small amounts;
> just the set of objects that are newer. It's just this time, when the
> 1.6G of new data was added, git ended up creating a new pack file,
> that contained the entire repo, and then tried sending that.
And this is wrong.
> I forgot to mention previously, that the source machine was running
> git 1.5.6.5, and was pushing to 1.5.6.3.
>
> I've tried duplicating this problem on a machine with 1.6.1.3, but
> either I don't fully understand the issue enough to replicate it, or
> the newer git doesn't have the problem.
That's possible. Maybe others on the list might recall possible issues
related to this that might have been fixed during that time.
> >> 2: Is there an option to tell git to *not* be so thorough when trying
> >> to find similiar files. videos/doc/pdf/etc aren't always very
> >> deltafiable, so I'd be happy to just do full content compares.
> >
> > Look at the gitattribute documentation. One thing that the doc appears
> > to be missing is information about the "delta" attribute. You can
> > disable delta compression on a file pattern that way.
>
> Um, if it's missing documentation, then how am I supposed to know
> about it?
Asking on the list, like you did. However this attribute should be
documented as well of course. I even think that someone posted a patch
for it a while ago which might have been dropped.
Nicolas
next prev parent reply other threads:[~2009-03-24 18:34 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-23 21:10 large(25G) repository in git Adam Heath
2009-03-24 1:19 ` Nicolas Pitre
2009-03-24 17:59 ` Adam Heath
2009-03-24 18:31 ` Nicolas Pitre [this message]
2009-03-24 20:55 ` Adam Heath
2009-03-25 1:21 ` Nicolas Pitre
2009-03-24 18:33 ` david
2009-03-24 8:59 ` Andreas Ericsson
2009-03-24 22:35 ` Adam Heath
2009-03-24 21:04 ` Sam Hocevar
2009-03-24 21:44 ` Adam Heath
2009-03-25 0:28 ` Nicolas Pitre
2009-03-25 0:57 ` Adam Heath
2009-03-25 1:47 ` Nicolas Pitre
2009-03-26 15:43 ` Marcel M. Cary
2009-03-26 16:35 ` Adam Heath
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.0903241404080.26337@xanadu.home \
--to=nico@cam.org \
--cc=doogie@brainfood.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).