git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "René Scheibe" <rene.scheibe@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: How to speedup git clone for big binary files (disable delta compression)
Date: Thu, 19 Jul 2018 01:33:58 -0400	[thread overview]
Message-ID: <20180719053357.GA23884@sigill.intra.peff.net> (raw)
In-Reply-To: <43b401ec-31fc-59dc-17c0-8dd7359726da@gmail.com>

On Thu, Jul 19, 2018 at 12:05:00AM +0200, René Scheibe wrote:

> Code:
> ---------------------------------------------------------------------
> #!/bin/bash
> 
> # setup repository
> git init --quiet repo
> cd repo
> 
> echo '*.bin binary -delta' > .gitattributes
> git add .gitattributes
> git commit --quiet -m 'attributes'
> 
> for i in $(seq 10); do
>     dd if=/dev/urandom of=data.bin bs=1MB count=10 status=none
>     git add data.bin
>     git commit --quiet -m "data $i"
> done
> cd ..
> 
> # create clone repository
> time git clone --no-local repo clone

This clone won't respect those attributes, because we don't dig into
in-repo attributes. There's actually some inconsistency in how Git
handles attribute locations. Usually they're just read from the top of
the working tree, but in some instances we read them from the tree
itself (e.g., git-archive respects some attributes from the tree it's
archiving).

If you do:

  echo "*.bin binary -delta" >repo/.git/info/attributes

then that does work (we always respect repo-level attributes like that).

> # repack original repository
> cd repo
> time git repack -a -d

In this case we're reading the attributes from the working tree, and it
does work. In theory the clone case could do so, too, but git-upload-pack,
the server side of the clone, avoids looking at the working tree at all.
That's something we _could_ address, but it doesn't really fix the
general case, since most clones will be from a bare repository anyway.

So in summary:

  1. Depending on what you're trying to do, the .git/info/attributes
     trick might be enough for you.

  2. I do think it would be nice for more places to respect attributes
     from in trees. There's a question of which tree, but I think in
     general reading them from HEAD in a bare repository would do what
     people want (it's a little funny if you're fetching branch "foo",
     but HEAD points to "bar", but it's at least consistent with the
     non-bare case). There's some prior art in the way we treat mailmaps
     (in a bare repo, we read HEAD:.mailmap).

     I suspect the patch may not be trivial, as I don't know how ready
     the attributes code is to handle in-tree lookups (remember that it
     is not just HEAD:.gitattributes we must care about, but other files
     sprinkled through the repository, like "HEAD:subdir/.gitattributes".

-Peff

      reply	other threads:[~2018-07-19  5:34 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-18 22:05 How to speedup git clone for big binary files (disable delta compression) René Scheibe
2018-07-19  5:33 ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180719053357.GA23884@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=rene.scheibe@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).