git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Curtin <ericcurtin17@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Initial git clone behaviour
Date: Wed, 6 Jan 2016 23:24:32 +0000	[thread overview]
Message-ID: <CANpvso4NZZcAYfQZPt0i2G+MWD4ppga0XQZwYDhq1syA2f1GJw@mail.gmail.com> (raw)
In-Reply-To: <CAPc5daXeNay1uF=qQ=G82kyu37uHhy-uEOWU6tz_bPYfFam=rA@mail.gmail.com>

On 6 January 2016 at 23:14, Junio C Hamano <gitster@pobox.com> wrote:
> On Wed, Jan 6, 2016 at 2:26 PM, Eric Curtin <ericcurtin17@gmail.com> wrote:
>>
>> Often I do a standard git clone:
>>
>> git clone (name of repo)
>>
>> Followed by a depth=1 clone in parallel, so I can get building and
>> working with the code asap:
>>
>> git clone --depth=1 (name of repo)
>>
>> Could we change the default behavior of git so that we initially get
>> all the current files quickly so that we can start working them and
>> then getting the rest of the data? At least a user could get to work
>> quicker this way. Any disadvantages of this approach?
>
> It would put more burden on a shared and limited resource (i.e.
> the server side).
>
> For example, I just tried a depth=1 clone of Linus's repository from
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>
> which transferred ~150MB pack data to check out 52k files in 90 seconds.
>
> On the other hand, a full clone transferred ~980MB pack data and it took
> 170 seconds to complete. You can already see that a full clone is highly
> optimized--it does not take even twice the time of getting the most recent
> checkout to grab 10 years worth of development (562k of commits).
>
> This efficiency comes from some tradeoffs, and one of them is that not
> all the data necessary to check out the latest tree contents can be stored
> near the beginning of the pack data. So "we'll checkout the tip while the
> remainder of the data is still incoming" would not be a workable, unless
> you are willing to destroy the full-clone performance.

Ok, my internet connection at home is pretty terrible then! I don't get
nowhere near these timings. It takes over an hour to do a full clone
from my house. And approx 30 mins for the depth=1 (approx, did not time
it).

That all makes sense I guess, probably not a good idea to regress the
full time performance for the sake of this use case. Was just a query
really!

      reply	other threads:[~2016-01-06 23:24 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-06 22:26 Initial git clone behaviour Eric Curtin
2016-01-06 23:14 ` Junio C Hamano
2016-01-06 23:24   ` Eric Curtin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANpvso4NZZcAYfQZPt0i2G+MWD4ppga0XQZwYDhq1syA2f1GJw@mail.gmail.com \
    --to=ericcurtin17@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).