git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Eric Wong <normalperson@yhbt.net>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: clones over rsync broken?
Date: Sat, 30 Jan 2016 01:30:36 -0500	[thread overview]
Message-ID: <20160130063036.GC1677@sigill.intra.peff.net> (raw)
In-Reply-To: <20160130054141.GB1677@sigill.intra.peff.net>

On Sat, Jan 30, 2016 at 12:41:41AM -0500, Jeff King wrote:

> It looks like this has been broken since cd547b4 (fetch/push: readd
> rsync support, 2007-10-01). The fix is just to ignore packed-refs
> entries which duplicate loose ones. But given the length of time this
> has been broken with nobody complaining, I have to wonder if it is
> simply time to retire the rsync protocol. Even if was made to work, it
> is a horribly inefficient protocol.

I took a look at whether there would be an easy fix. There are three
obvious ways to go about this:

  1. Use the loose/packed reading code from refs/files-backend.c.

     This would require some refactoring, as we currently assume we are
     either reading the refs for _this_ repository, or for a submodule.
     This is sort-of like reading a submodule, but I think there are a
     few rough edges.

     Worse, though, is that the upcoming pluggable refs work will
     probably require that submodules and the main repo have the same
     ref backend. I'm a little dubious of that requirement in general,
     but certainly it would be a show-stopper here.

  2. Create a "struct transport" for the tempdir holding the data we
     rsynced from the other side, and just treat it like a local repo.
     We already do something like this to handle object "alternates"
     repositories (and we run "upload-pack" on the other directory and
     parse it just like a real remote).

     Unfortunately, what we bring over in get_refs_via_pack is not
     enough for this to work. It's _just_ the refs/ directories. We can
     use "git init" to make it more like a real repo, but ultimately we
     don't have any objects, so upload-pack will complain.

     We could fix that by just rsyncing the objects down at this stage,
     too. It's not like git is careful enough to do a real "what do we
     need" walk like it does for dumb-http. But we would end up rsyncing
     even in cases where we didn't need _any_ objects, though that is
     probably a vast minority case.

  3. Just teach the local ad-hoc loose and packed readers to do the
     proper deduplication. I started on this, but then realized that we
     really do implement a from-scratch packed-refs reader here. And
     it's missing some features, like parsing peeled tags.

     So it really would want to call into the regular packed-refs
     parsing code, which requires more refactoring as in (1).

Of all of these, I think (2) is the closest to sane, because it lets
upload-pack do the heavy-lifting, meaning we can understand whatever
formats we rsync from the other side. But given that rsync is already
naive about what objects it pulls (i.e., it gets everything), I have to
really question whether there is any value in using git-over-rsync
versus just:

  rsync $src tmp/
  git clone tmp my-repo ;# will hard-link, no extra space needed!
  rm -rf $tmp

I guess that doesn't handle subsequent fetches. But
really...git-over-rsync is just an awful protocol. Nobody should be
using it. Having looked at it in more detail, I'm more in favor than
ever of removing it.

-Peff

  reply	other threads:[~2016-01-30  6:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-30  5:11 clones over rsync broken? Eric Wong
2016-01-30  5:41 ` Jeff King
2016-01-30  6:30   ` Jeff King [this message]
2016-01-30  7:21     ` [PATCH] transport: drop support for git-over-rsync Jeff King
2016-01-30  7:28       ` Jeff King
2016-01-30  8:04         ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160130063036.GC1677@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=normalperson@yhbt.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).