* git-clone --how-much-disk-space-will-this-cost-me? [--depth n]
@ 2008-12-15 23:53 jidanni
2008-12-16 0:22 ` Jean-Luc Herren
2008-12-16 0:43 ` Jeff King
0 siblings, 2 replies; 12+ messages in thread
From: jidanni @ 2008-12-15 23:53 UTC (permalink / raw)
To: git
The git-clone manpage should mention how to determine how much disk
space will be used.
You see we beginners (who haven't learned git yet, so no patches
forthcoming, thank you) are often told "Just do git-clone
git://git.example.org/bla/ to get started!". Being smart, we read up on
--depth 1 to limit potential disk occupation, but we still have no
idea of how much disk space we will need. We cant just use HEAD(1)
because this is not HTTP.
Therefore the git-clone man page, one of the main entry points for the
beginner, should say how to determine how much disk space we will need
for git-clone or git-clone --depth 1 etc.
And don't tell us to just figure it out from the progress messages
after the download begins, and hit ^C if we don't like it.
Let's take a look at those messages while were at it,
$ git-clone --depth 1 git://git.sv.gnu.org/coreutils/
Initialized empty Git repository in /usr/local/src/jidanni/coreutils/.git/
remote: Counting objects: 26240, done.
remote: Compressing objects: 100% (14001/14001), done.
remote: Total 26240 (delta 21577), reused 15354 (delta 12095)
Receiving objects: 100% (26240/26240), 15.76 MiB | 26 KiB/s, done.
Resolving deltas: 100% (21577/21577), done.
$ du -sh
27M .
Nope, nowhere does it directly say "You Holmes, are in for 27
Megabytes (on your piddly modem)". There obviously is math involved to
figure it out... math!
Also add examples of how one first probes a remote tree one has been
told about, determines what parts of it he might want, and then
finally git-clones just those parts.
Also document what --depth 0 or even -1 will do.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-clone --how-much-disk-space-will-this-cost-me? [--depth n]
2008-12-15 23:53 git-clone --how-much-disk-space-will-this-cost-me? [--depth n] jidanni
@ 2008-12-16 0:22 ` Jean-Luc Herren
2008-12-16 0:37 ` jidanni
2008-12-16 0:43 ` Jeff King
1 sibling, 1 reply; 12+ messages in thread
From: Jean-Luc Herren @ 2008-12-16 0:22 UTC (permalink / raw)
To: jidanni, git
Hi!
jidanni@jidanni.org wrote:
> The git-clone manpage should mention how to determine how much disk
> space will be used.
> [...]
> And don't tell us to just figure it out from the progress messages
> after the download begins, and hit ^C if we don't like it.
Maybe that's a dumb answer, but... why not? This works pretty
well for me.
> Nope, nowhere does it directly say "You Holmes, are in for 27
> Megabytes (on your piddly modem)". There obviously is math involved to
> figure it out... math!
So maybe what you really want is an ETA display during the cloning
process? Sounds like a good idea to me.
jlh
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-clone --how-much-disk-space-will-this-cost-me? [--depth n]
2008-12-16 0:22 ` Jean-Luc Herren
@ 2008-12-16 0:37 ` jidanni
2008-12-16 2:07 ` Jean-Luc Herren
0 siblings, 1 reply; 12+ messages in thread
From: jidanni @ 2008-12-16 0:37 UTC (permalink / raw)
To: jlh; +Cc: git
>> And don't tell us to just figure it out from the progress messages
>> after the download begins, and hit ^C if we don't like it.
JH> Maybe that's a dumb answer, but... why not? This works pretty
JH> well for me.
Sounds like my last marriage. "Just hit ^C if you don't like it". How
do you think the in-laws will feel? Nope, plan ahead I now say.
JH> So maybe what you really want is an ETA display during the cloning
JH> process? Sounds like a good idea to me.
ETA implies that git has an estimate of what is going to happen.
The key is to now allow the user to get such an estimate too, before
deciding to git-clone or not.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-clone --how-much-disk-space-will-this-cost-me? [--depth n]
2008-12-16 0:37 ` jidanni
@ 2008-12-16 2:07 ` Jean-Luc Herren
2008-12-16 5:45 ` Nicolas Pitre
0 siblings, 1 reply; 12+ messages in thread
From: Jean-Luc Herren @ 2008-12-16 2:07 UTC (permalink / raw)
To: jidanni, git
jidanni@jidanni.org wrote:
> JH> So maybe what you really want is an ETA display during the cloning
> JH> process? Sounds like a good idea to me.
>
> ETA implies that git has an estimate of what is going to happen.
Aren't you implying this too from the beginning? But reading
Jeff's reply, there seems to be a reason why there isn't an ETA
already.
However, since some repositories get cloned in the same way very
often, there could be some cache that keeps these size information
around for any subsequent identical clones. The server could then
send a hint about the expected amount of data at the beginning.
jlh
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-clone --how-much-disk-space-will-this-cost-me? [--depth n]
2008-12-16 2:07 ` Jean-Luc Herren
@ 2008-12-16 5:45 ` Nicolas Pitre
2008-12-17 15:44 ` Shawn O. Pearce
0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Pitre @ 2008-12-16 5:45 UTC (permalink / raw)
To: Jean-Luc Herren; +Cc: jidanni, git
On Tue, 16 Dec 2008, Jean-Luc Herren wrote:
> jidanni@jidanni.org wrote:
> > JH> So maybe what you really want is an ETA display during the cloning
> > JH> process? Sounds like a good idea to me.
> >
> > ETA implies that git has an estimate of what is going to happen.
>
> Aren't you implying this too from the beginning? But reading
> Jeff's reply, there seems to be a reason why there isn't an ETA
> already.
>
> However, since some repositories get cloned in the same way very
> often, there could be some cache that keeps these size information
> around for any subsequent identical clones. The server could then
> send a hint about the expected amount of data at the beginning.
And then you'll end up being the unlucky bastard to be the first to
clones the new latest revision of a repository, and ETA won't be
available, and you'll complain about the fact that sometimes it is there
and sometimes it is not.
The fact is, fundamentally, we don't know how many bytes to push when
generating a pack to answer the clone request. Sometimes we _could_ but
not always. It is therefore better to be consistent and let people know
that there is simply no ETA.
Nicolas
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-clone --how-much-disk-space-will-this-cost-me? [--depth n]
2008-12-16 5:45 ` Nicolas Pitre
@ 2008-12-17 15:44 ` Shawn O. Pearce
2008-12-17 16:15 ` Nicolas Pitre
0 siblings, 1 reply; 12+ messages in thread
From: Shawn O. Pearce @ 2008-12-17 15:44 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Jean-Luc Herren, jidanni, git
Nicolas Pitre <nico@cam.org> wrote:
> On Tue, 16 Dec 2008, Jean-Luc Herren wrote:
> > jidanni@jidanni.org wrote:
> > > JH> So maybe what you really want is an ETA display during the cloning
> > > JH> process? Sounds like a good idea to me.
>
> And then you'll end up being the unlucky bastard to be the first to
> clones the new latest revision of a repository, and ETA won't be
> available, and you'll complain about the fact that sometimes it is there
> and sometimes it is not.
>
> The fact is, fundamentally, we don't know how many bytes to push when
> generating a pack to answer the clone request. Sometimes we _could_ but
> not always. It is therefore better to be consistent and let people know
> that there is simply no ETA.
Hmm.
What if on an initial clone (no "have" lines received) we sum up
the sizes of the *.pack and all of the loose objects and sent
that as an initial size estimate. Its going to be the upper bound
of the final pack that we send. At worst it over-estimates on the
size and download finishes faster.
I'm willing to bet that most of the "big" repositories out there
don't have a lot of garbage in them. Linus' kernel repository
doesn't rewind, so he has 0 garbage. Anyone cloning from him would
get a reasonable estimate. Likewise with a Gentoo/KDE/WebKit/gcc
sort of giant tree most of that is in a huge historical pack.
That one pack file alone is completely reachable and dominates the
transfer size.
On smaller trees where people may have a lot of rebase garbage or
everything is loose the estimate will be quite a bit above what we
transfer, but how much so that it matters?
Yea, a single stray binary of some *.mpg or *.iso accidentally
added and then removed (and now unreachable) will vastly inflate
the numbers. In which case the repository owner will be encouraged
to prune when people won't clone his estimated 8 GiB download,
which is actually only 1 MiB.
--
Shawn.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-clone --how-much-disk-space-will-this-cost-me? [--depth n]
2008-12-17 15:44 ` Shawn O. Pearce
@ 2008-12-17 16:15 ` Nicolas Pitre
2008-12-17 16:21 ` Shawn O. Pearce
0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Pitre @ 2008-12-17 16:15 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Jean-Luc Herren, jidanni, git
On Wed, 17 Dec 2008, Shawn O. Pearce wrote:
> Nicolas Pitre <nico@cam.org> wrote:
> > The fact is, fundamentally, we don't know how many bytes to push when
> > generating a pack to answer the clone request. Sometimes we _could_ but
> > not always. It is therefore better to be consistent and let people know
> > that there is simply no ETA.
>
> Hmm.
>
> What if on an initial clone (no "have" lines received) we sum up
> the sizes of the *.pack and all of the loose objects and sent
> that as an initial size estimate. Its going to be the upper bound
> of the final pack that we send. At worst it over-estimates on the
> size and download finishes faster.
It is a kludge. It makes the system imprecise for little benefit. Once
you start adding kludges like that into your system, people will always
ask for more kludges, and in the end your system isn't as reliable. We
all know about some other operating system which was designed like that.
I personally don't want to go there.
> Yea, a single stray binary of some *.mpg or *.iso accidentally
> added and then removed (and now unreachable) will vastly inflate
> the numbers. In which case the repository owner will be encouraged
> to prune when people won't clone his estimated 8 GiB download,
> which is actually only 1 MiB.
And I consider any system doing such thing completely stupid. Either
you consistently know the information or you don't. When you don't, it
is best to not create expectations for the user. And so far I think
that 99.9% of git users are just fine with the progress display we
currently provide.
Nicolas
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-clone --how-much-disk-space-will-this-cost-me? [--depth n]
2008-12-17 16:15 ` Nicolas Pitre
@ 2008-12-17 16:21 ` Shawn O. Pearce
2008-12-17 16:46 ` Nicolas Pitre
0 siblings, 1 reply; 12+ messages in thread
From: Shawn O. Pearce @ 2008-12-17 16:21 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Jean-Luc Herren, jidanni, git
Nicolas Pitre <nico@cam.org> wrote:
>
> And I consider any system doing such thing completely stupid. Either
> you consistently know the information or you don't. When you don't, it
> is best to not create expectations for the user. And so far I think
> that 99.9% of git users are just fine with the progress display we
> currently provide.
Certainly true here; I never care how big the source I'm cloning is.
But then again I have pretty good network connectivity at work
and at least cable modem service at home... most things clone down
pretty fast.
Its a quick hack to give a size upper bound. I don't think its
that ugly. Our network protocol is uglier with all of its hidden
fields jammed behind that NUL in the first advertisement line.
But I digress.
The better feature is probably resumable clone anyway. At least
then people can abort a "long running" clone and have a good chance
they can pick it up again in the near future. Its also not easy to
implement, which is why we've only been talking about it for years
and never actually seen a patch proposing to do it.
--
Shawn.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-clone --how-much-disk-space-will-this-cost-me? [--depth n]
2008-12-17 16:21 ` Shawn O. Pearce
@ 2008-12-17 16:46 ` Nicolas Pitre
2008-12-17 16:48 ` Shawn O. Pearce
0 siblings, 1 reply; 12+ messages in thread
From: Nicolas Pitre @ 2008-12-17 16:46 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Jean-Luc Herren, jidanni, git
On Wed, 17 Dec 2008, Shawn O. Pearce wrote:
> Nicolas Pitre <nico@cam.org> wrote:
> >
> > And I consider any system doing such thing completely stupid. Either
> > you consistently know the information or you don't. When you don't, it
> > is best to not create expectations for the user. And so far I think
> > that 99.9% of git users are just fine with the progress display we
> > currently provide.
>
> Certainly true here; I never care how big the source I'm cloning is.
> But then again I have pretty good network connectivity at work
> and at least cable modem service at home... most things clone down
> pretty fast.
>
> Its a quick hack to give a size upper bound. I don't think its
> that ugly. Our network protocol is uglier with all of its hidden
> fields jammed behind that NUL in the first advertisement line.
> But I digress.
The ugliness in the protocol is encapsulated away from user view, and we
could even seemlessly introduce a new protocol at any time with no
issues if we wanted to.
This "quick hack" is imprecise, unreliable, and directly affect user
perception. This is way more dammageable as once users are used to it,
good or bad, it won't be possible to get rid of it.
> The better feature is probably resumable clone anyway. At least
> then people can abort a "long running" clone and have a good chance
> they can pick it up again in the near future.
Absolutely.
> Its also not easy to
> implement, which is why we've only been talking about it for years
> and never actually seen a patch proposing to do it.
A partial clone could possibly be turned into a shalow clone if at least
the top commit is complete ...
Nicolas
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-clone --how-much-disk-space-will-this-cost-me? [--depth n]
2008-12-17 16:46 ` Nicolas Pitre
@ 2008-12-17 16:48 ` Shawn O. Pearce
2008-12-17 16:56 ` Nicolas Pitre
0 siblings, 1 reply; 12+ messages in thread
From: Shawn O. Pearce @ 2008-12-17 16:48 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Jean-Luc Herren, jidanni, git
Nicolas Pitre <nico@cam.org> wrote:
> > Its also not easy to
> > implement, which is why we've only been talking about it for years
> > and never actually seen a patch proposing to do it.
>
> A partial clone could possibly be turned into a shalow clone if at least
> the top commit is complete ...
But you of all people should know well that the top commit is also
a huge part of most clones. Getting that top commit can be 30-60%
of the repository itself. :-|
--
Shawn.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-clone --how-much-disk-space-will-this-cost-me? [--depth n]
2008-12-17 16:48 ` Shawn O. Pearce
@ 2008-12-17 16:56 ` Nicolas Pitre
0 siblings, 0 replies; 12+ messages in thread
From: Nicolas Pitre @ 2008-12-17 16:56 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Jean-Luc Herren, jidanni, git
On Wed, 17 Dec 2008, Shawn O. Pearce wrote:
> Nicolas Pitre <nico@cam.org> wrote:
> > > Its also not easy to
> > > implement, which is why we've only been talking about it for years
> > > and never actually seen a patch proposing to do it.
> >
> > A partial clone could possibly be turned into a shalow clone if at least
> > the top commit is complete ...
>
> But you of all people should know well that the top commit is also
> a huge part of most clones. Getting that top commit can be 30-60%
> of the repository itself. :-|
Sure I know. This is why I'm not pushing this solution really much. ;)
I have ideas about how to solve this in a really nice way, but that
implies pack V4.
Nicolas
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: git-clone --how-much-disk-space-will-this-cost-me? [--depth n]
2008-12-15 23:53 git-clone --how-much-disk-space-will-this-cost-me? [--depth n] jidanni
2008-12-16 0:22 ` Jean-Luc Herren
@ 2008-12-16 0:43 ` Jeff King
1 sibling, 0 replies; 12+ messages in thread
From: Jeff King @ 2008-12-16 0:43 UTC (permalink / raw)
To: jidanni; +Cc: git
On Tue, Dec 16, 2008 at 07:53:42AM +0800, jidanni@jidanni.org wrote:
> The git-clone manpage should mention how to determine how much disk
> space will be used.
OK. Do you have a suggestion for how to figure that out?
> Let's take a look at those messages while were at it,
> $ git-clone --depth 1 git://git.sv.gnu.org/coreutils/
> Initialized empty Git repository in /usr/local/src/jidanni/coreutils/.git/
> remote: Counting objects: 26240, done.
> remote: Compressing objects: 100% (14001/14001), done.
> remote: Total 26240 (delta 21577), reused 15354 (delta 12095)
> Receiving objects: 100% (26240/26240), 15.76 MiB | 26 KiB/s, done.
> Resolving deltas: 100% (21577/21577), done.
> $ du -sh
> 27M .
> Nope, nowhere does it directly say "You Holmes, are in for 27
> Megabytes (on your piddly modem)". There obviously is math involved to
> figure it out... math!
That's because we don't know that it will be 27 megabytes. That progress
counter is counting the number of _objects_, not bytes. So you can make
a rough estimate, but only after receiving some objects, and even then
it can be wildly off (because you are assuming the size of the objects
still to get averages the same as the size of the objects you have
already gotten).
AFAIK, nowhere in the sent data is there an indication of how many bytes
are in the resulting pack (and in many cases, the pack is generated on
the fly and the information not only is not sent, but is not available
anywhere).
-Peff
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-12-17 16:58 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-15 23:53 git-clone --how-much-disk-space-will-this-cost-me? [--depth n] jidanni
2008-12-16 0:22 ` Jean-Luc Herren
2008-12-16 0:37 ` jidanni
2008-12-16 2:07 ` Jean-Luc Herren
2008-12-16 5:45 ` Nicolas Pitre
2008-12-17 15:44 ` Shawn O. Pearce
2008-12-17 16:15 ` Nicolas Pitre
2008-12-17 16:21 ` Shawn O. Pearce
2008-12-17 16:46 ` Nicolas Pitre
2008-12-17 16:48 ` Shawn O. Pearce
2008-12-17 16:56 ` Nicolas Pitre
2008-12-16 0:43 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).