From: Jeff King <peff@peff.net>
To: Eric Wong <normalperson@yhbt.net>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: clones over rsync broken?
Date: Sat, 30 Jan 2016 00:41:41 -0500 [thread overview]
Message-ID: <20160130054141.GB1677@sigill.intra.peff.net> (raw)
In-Reply-To: <20160130051133.GA21973@dcvr.yhbt.net>
On Sat, Jan 30, 2016 at 05:11:33AM +0000, Eric Wong wrote:
> I have not used rsync remotes in ages, but I was working on the
> patch for -4/-6 support and decided to test it against rsync.kernel.org
>
> Cloning git.git takes forever and failed with:
No kidding. There are over 95,000 unreachable loose objects consuming a
gigabyte. The rsync transport blindly pulls all of the data over, with
no idea that it doesn't need most of it.
> $ git clone rsync://rsync.kernel.org/pub/scm/git/git.git
> Checking connectivity... fatal: bad object ecdc6d8612df80e871ed34bb6c3b01b20b0b82e6
> fatal: remote did not send all necessary objects
All those objects, and we still manage to miss one. :)
Interestingly, that object does not seem to exist at all on the remote!
I think this is the same bug as the one below. Read on...
> However, trying to clone a smaller repo like pahole.git via rsync fails
> differently; this looks more like a git bug:
>
> $ git clone rsync://rsync.kernel.org/pub/scm/devel/pahole/pahole.git
> fatal: Multiple updates for ref 'refs/remotes/origin/master' not allowed.
>
> Using rsync(1) manually to grab pahole.git and inspecting the bare
> repo with yields no anomalies with "git fsck --full".
> $GIT_DIR/info/refs and $GIT_DIR/packed-refs both look fine, but
> perhaps it's confused by the existence of $GIT_DIR/refs/heads/master
> as a loose ref?
Yes, that's exactly what's going on. In get_refs_via_rsync, we blindly
concatenate the list of loose refs and packed refs. But that's not
right, and never has been. If the same ref exists in both stores, the
loose ref takes precedence (that is how we can write new refs without
having to rewrite the whole packed-refs file).
So we erroneously believe that refs/heads/master exists _twice_ on the
remote, with two different values (and try to store it twice as
refs/remotes/origin/master). But we should be accepting only the loose
value.
This explains the git.git problem, too. There are two entries for
refs/heads/pu: one loose and one in packed-refs. The latter is a stale,
older value, and should never be looked at. But because pu gets rewound,
its older values are not necessarily reachable and may even have been
pruned!
So no, we do not have ecdc6d86, but neither does the upstream, and
nothing is referencing it.
It looks like this has been broken since cd547b4 (fetch/push: readd
rsync support, 2007-10-01). The fix is just to ignore packed-refs
entries which duplicate loose ones. But given the length of time this
has been broken with nobody complaining, I have to wonder if it is
simply time to retire the rsync protocol. Even if was made to work, it
is a horribly inefficient protocol.
-Peff
next prev parent reply other threads:[~2016-01-30 5:41 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-30 5:11 clones over rsync broken? Eric Wong
2016-01-30 5:41 ` Jeff King [this message]
2016-01-30 6:30 ` Jeff King
2016-01-30 7:21 ` [PATCH] transport: drop support for git-over-rsync Jeff King
2016-01-30 7:28 ` Jeff King
2016-01-30 8:04 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160130054141.GB1677@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=normalperson@yhbt.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).