git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matt Glazar <strager@fb.com>
To: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: git-fetch pulls already-pulled objects?
Date: Wed, 28 Oct 2015 23:28:24 +0000	[thread overview]
Message-ID: <D256A718.1373A%strager@fb.com> (raw)

On a remote, I have two Git commit objects which point to the same tree
object (created with git commit-tree). If I fetch one of the commits, the
commit object (including the tree object) is fetched. If I then fetch the
other commit, the tree object (and its dependencies) is fetched *again* (I
think). I don't watch the tree object downloaded again, because it is
large (multi-gigabyte). Because the tree object exists locally, I think it
should not be downloaded.

Is this a bug in Git, or is this by design? How can I confirm that the
tree object (and dependencies) are downloaded twice? Is there are more
complicated git-fetch (or similar) command I can execute to not download
the already-downloaded tree objects? (I have the hash of the tree object
which would be potentially re-downloaded, if that helps.)

Sequence of commands to reproduce:

# Replace this with the URL to an empty Git repository.
remote=ssh://foo/bar.git

# Create some random data to exaggerate git-fetch times.
# If you have a slow remote, reduce 'count'.
mkdir minimal
cd minimal
dd if=/dev/urandom of=random bs=65536 count=4096

# Create our two commits (master and master2).
git init
git add random
git commit -m 'Random data (commit 1)'
git branch master2 \
  "$(echo 'Random data (commit 2)' \
    | git commit-tree 'HEAD^{tree}')"

# Push our commits. Expected to take some time.
git remote add origin "${remote}"
git push origin \
  master:refs/heads/master \
  master2:refs/heads/master2

# Clone master. Expected to take some time.
cd ..
mkdir minimal-clone
git clone --single-branch --branch master "${remote}"

# Fetch master2. Should be nearly instant, but takes some
# time. Seems to be download everything again.
cd minimal-clone
git fetch origin master2

# Try again. git-fetch takes a while, but shouldn't.
rm -f .git/FETCH_HEAD
git gc --prune=all
git fetch origin master2

Info about my system:


Local (pusher):
OS: OS X 10.10.5
git: git version 2.0.1
ssh: OpenSSH_6.2p2, OSSLShim 0.9.8r 8 Dec 2011


Remote (server):
OS: Linux 4.0.9 (CentOS 6)
git: git version 2.4.6
sshd: OpenSSH_6.7p1-hpn14v5, OpenSSL 1.0.1e-fips 11 Feb 2013


             reply	other threads:[~2015-10-28 23:28 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-28 23:28 Matt Glazar [this message]
2015-10-29 17:32 ` git-fetch pulls already-pulled objects? Junio C Hamano
2015-10-29 18:08   ` Matt Glazar
2015-10-29 18:42     ` Junio C Hamano
2015-10-29 19:52       ` Matt Glazar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D256A718.1373A%strager@fb.com \
    --to=strager@fb.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).