From: Jeff King <peff@peff.net>
To: "René Scheibe" <rene.scheibe@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: How to speedup git clone for big binary files (disable delta compression)
Date: Thu, 19 Jul 2018 01:33:58 -0400 [thread overview]
Message-ID: <20180719053357.GA23884@sigill.intra.peff.net> (raw)
In-Reply-To: <43b401ec-31fc-59dc-17c0-8dd7359726da@gmail.com>
On Thu, Jul 19, 2018 at 12:05:00AM +0200, René Scheibe wrote:
> Code:
> ---------------------------------------------------------------------
> #!/bin/bash
>
> # setup repository
> git init --quiet repo
> cd repo
>
> echo '*.bin binary -delta' > .gitattributes
> git add .gitattributes
> git commit --quiet -m 'attributes'
>
> for i in $(seq 10); do
> dd if=/dev/urandom of=data.bin bs=1MB count=10 status=none
> git add data.bin
> git commit --quiet -m "data $i"
> done
> cd ..
>
> # create clone repository
> time git clone --no-local repo clone
This clone won't respect those attributes, because we don't dig into
in-repo attributes. There's actually some inconsistency in how Git
handles attribute locations. Usually they're just read from the top of
the working tree, but in some instances we read them from the tree
itself (e.g., git-archive respects some attributes from the tree it's
archiving).
If you do:
echo "*.bin binary -delta" >repo/.git/info/attributes
then that does work (we always respect repo-level attributes like that).
> # repack original repository
> cd repo
> time git repack -a -d
In this case we're reading the attributes from the working tree, and it
does work. In theory the clone case could do so, too, but git-upload-pack,
the server side of the clone, avoids looking at the working tree at all.
That's something we _could_ address, but it doesn't really fix the
general case, since most clones will be from a bare repository anyway.
So in summary:
1. Depending on what you're trying to do, the .git/info/attributes
trick might be enough for you.
2. I do think it would be nice for more places to respect attributes
from in trees. There's a question of which tree, but I think in
general reading them from HEAD in a bare repository would do what
people want (it's a little funny if you're fetching branch "foo",
but HEAD points to "bar", but it's at least consistent with the
non-bare case). There's some prior art in the way we treat mailmaps
(in a bare repo, we read HEAD:.mailmap).
I suspect the patch may not be trivial, as I don't know how ready
the attributes code is to handle in-tree lookups (remember that it
is not just HEAD:.gitattributes we must care about, but other files
sprinkled through the repository, like "HEAD:subdir/.gitattributes".
-Peff
prev parent reply other threads:[~2018-07-19 5:34 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-18 22:05 How to speedup git clone for big binary files (disable delta compression) René Scheibe
2018-07-19 5:33 ` Jeff King [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180719053357.GA23884@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=rene.scheibe@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).