git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Nicolas Pitre <nico@cam.org>
Cc: Jean-Luc Herren <jlh@gmx.ch>, jidanni@jidanni.org, git@vger.kernel.org
Subject: Re: git-clone --how-much-disk-space-will-this-cost-me? [--depth n]
Date: Wed, 17 Dec 2008 07:44:07 -0800	[thread overview]
Message-ID: <20081217154407.GZ32487@spearce.org> (raw)
In-Reply-To: <alpine.LFD.2.00.0812160039180.30035@xanadu.home>

Nicolas Pitre <nico@cam.org> wrote:
> On Tue, 16 Dec 2008, Jean-Luc Herren wrote:
> > jidanni@jidanni.org wrote:
> > > JH> So maybe what you really want is an ETA display during the cloning
> > > JH> process?  Sounds like a good idea to me.
> 
> And then you'll end up being the unlucky bastard to be the first to 
> clones the new latest revision of a repository, and ETA won't be 
> available, and you'll complain about the fact that sometimes it is there 
> and sometimes it is not.
> 
> The fact is, fundamentally, we don't know how many bytes to push when 
> generating a pack to answer the clone request.  Sometimes we _could_ but 
> not always.  It is therefore better to be consistent and let people know 
> that there is simply no ETA.

Hmm.

What if on an initial clone (no "have" lines received) we sum up
the sizes of the *.pack and all of the loose objects and sent
that as an initial size estimate.  Its going to be the upper bound
of the final pack that we send.  At worst it over-estimates on the
size and download finishes faster.

I'm willing to bet that most of the "big" repositories out there
don't have a lot of garbage in them.  Linus' kernel repository
doesn't rewind, so he has 0 garbage.  Anyone cloning from him would
get a reasonable estimate.  Likewise with a Gentoo/KDE/WebKit/gcc
sort of giant tree most of that is in a huge historical pack.
That one pack file alone is completely reachable and dominates the
transfer size.

On smaller trees where people may have a lot of rebase garbage or
everything is loose the estimate will be quite a bit above what we
transfer, but how much so that it matters?

Yea, a single stray binary of some *.mpg or *.iso accidentally
added and then removed (and now unreachable) will vastly inflate
the numbers.  In which case the repository owner will be encouraged
to prune when people won't clone his estimated 8 GiB download,
which is actually only 1 MiB.

-- 
Shawn.

  reply	other threads:[~2008-12-17 15:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-15 23:53 git-clone --how-much-disk-space-will-this-cost-me? [--depth n] jidanni
2008-12-16  0:22 ` Jean-Luc Herren
2008-12-16  0:37   ` jidanni
2008-12-16  2:07     ` Jean-Luc Herren
2008-12-16  5:45       ` Nicolas Pitre
2008-12-17 15:44         ` Shawn O. Pearce [this message]
2008-12-17 16:15           ` Nicolas Pitre
2008-12-17 16:21             ` Shawn O. Pearce
2008-12-17 16:46               ` Nicolas Pitre
2008-12-17 16:48                 ` Shawn O. Pearce
2008-12-17 16:56                   ` Nicolas Pitre
2008-12-16  0:43 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081217154407.GZ32487@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=jidanni@jidanni.org \
    --cc=jlh@gmx.ch \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).