All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Glazar <strager@fb.com>
To: "git@vger.kernel.org" <git@vger.kernel.org>
Subject: git-fetch pulls already-pulled objects?
Date: Wed, 28 Oct 2015 23:28:24 +0000	[thread overview]
Message-ID: <D256A718.1373A%strager@fb.com> (raw)

On a remote, I have two Git commit objects which point to the same tree
object (created with git commit-tree). If I fetch one of the commits, the
commit object (including the tree object) is fetched. If I then fetch the
other commit, the tree object (and its dependencies) is fetched *again* (I
think). I don't watch the tree object downloaded again, because it is
large (multi-gigabyte). Because the tree object exists locally, I think it
should not be downloaded.

Is this a bug in Git, or is this by design? How can I confirm that the
tree object (and dependencies) are downloaded twice? Is there are more
complicated git-fetch (or similar) command I can execute to not download
the already-downloaded tree objects? (I have the hash of the tree object
which would be potentially re-downloaded, if that helps.)

Sequence of commands to reproduce:

# Replace this with the URL to an empty Git repository.
remote=ssh://foo/bar.git

# Create some random data to exaggerate git-fetch times.
# If you have a slow remote, reduce 'count'.
mkdir minimal
cd minimal
dd if=/dev/urandom of=random bs=65536 count=4096

# Create our two commits (master and master2).
git init
git add random
git commit -m 'Random data (commit 1)'
git branch master2 \
  "$(echo 'Random data (commit 2)' \
    | git commit-tree 'HEAD^{tree}')"

# Push our commits. Expected to take some time.
git remote add origin "${remote}"
git push origin \
  master:refs/heads/master \
  master2:refs/heads/master2

# Clone master. Expected to take some time.
cd ..
mkdir minimal-clone
git clone --single-branch --branch master "${remote}"

# Fetch master2. Should be nearly instant, but takes some
# time. Seems to be download everything again.
cd minimal-clone
git fetch origin master2

# Try again. git-fetch takes a while, but shouldn't.
rm -f .git/FETCH_HEAD
git gc --prune=all
git fetch origin master2

Info about my system:


Local (pusher):
OS: OS X 10.10.5
git: git version 2.0.1
ssh: OpenSSH_6.2p2, OSSLShim 0.9.8r 8 Dec 2011


Remote (server):
OS: Linux 4.0.9 (CentOS 6)
git: git version 2.4.6
sshd: OpenSSH_6.7p1-hpn14v5, OpenSSL 1.0.1e-fips 11 Feb 2013


             reply	other threads:[~2015-10-28 23:28 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-28 23:28 Matt Glazar [this message]
2015-10-29 17:32 ` git-fetch pulls already-pulled objects? Junio C Hamano
2015-10-29 18:08   ` Matt Glazar
2015-10-29 18:42     ` Junio C Hamano
2015-10-29 19:52       ` Matt Glazar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D256A718.1373A%strager@fb.com \
    --to=strager@fb.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.