git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josh Triplett <josh@joshtriplett.org>
To: Jeff King <peff@peff.net>
Cc: Jamey Sharp <jamey@minilop.net>,
	git@vger.kernel.org, "Shawn O. Pearce" <spearce@spearce.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Johannes Sixt <johannes.sixt@telecom.at>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v3 3/3] Add documentation for virtual repositories
Date: Wed, 25 May 2011 11:03:42 -0700	[thread overview]
Message-ID: <20110525180341.GA2324@leaf> (raw)
In-Reply-To: <20110525160708.GE8795@sigill.intra.peff.net>

On Wed, May 25, 2011 at 12:07:08PM -0400, Jeff King wrote:
> On Tue, May 24, 2011 at 05:46:32PM -0700, Jamey Sharp wrote:
> 
> >  Documentation/Makefile                 |    2 +-
> >  Documentation/git-http-backend.txt     |    4 +-
> >  Documentation/gitvirtual.txt           |   76 ++++++++++++++++++++++++++++++++
> >  contrib/completion/git-completion.bash |    2 +-
> 
> Maybe it would make sense to mention your new options to upload-pack and
> receive-pack in their manpages; the description can be short, but refer
> the user to gitvirtual.

Fair enough.  We'll go ahead and document them (in patch 1/3 with a
reference to gitvirtual added in patch 3/3), and avoid making the pile
of undocumented upload-pack and receive-pack options larger. :)

> > +Given many repositories with copies of the same objects (such as
> > +branches of the same source), sharing a common object store will avoid
> > +duplication.  Alternates provide a single baseline, but don't handle
> > +ongoing activity in the various repositories.  Furthermore, operations
> > +such as linkgit:git-gc[1] need to know about all of the refs.
> 
> It's not quite true that alternates provide only a single baseline. They
> can be updated and objects consolidated over time (e.g., with a nightly
> repack). The problem is that they require management to do so (this is
> also a benefit, if you want a sharing policy besides "all repos have all
> objects").

True enough.  We wanted something that automatically worked without
background maintenance, but alternates can help if you keep moving
common objects to the alternate repository.

> > +linkgit:git-upload-pack[1] and linkgit:git-receive-pack[1] rewrite the
> > +names of refs and heads as specified by the --ref-prefix and --head
> > +options.  For instance, --ref-prefix=`virtual/reponame/` will use
> > ++pass:[refs/virtual/reponame/heads/*]+ and
> > ++pass:[refs/virtual/reponame/tags/*]+.  git-upload-pack and
> > +git-receive-pack will ignore any references that do not match the
> > +specified prefix.
> 
> Thinking on the whole idea a bit more, is there a reason to restrict
> this to upload-pack and receive-pack? Sure, they are the most obvious
> places to use it for hosting, but might I not want to be able to do:
> 
>   cd /path/to/mega-repository.git
>   git --ref-prefix=virtual/repo1 log master
> 
> to do server-side scripting inside the virtual repos (or more likely,
> setting GIT_REF_PREFIX at the top of your script).

Many git commands will need special handling for this, though.  For
instance, gc needs to know about all refs, not just a prefix of refs;
otherwise it will break the repository.  Or, for an example within a
single command, the checks for updating a currently-checked-out ref in a
repository need to use the repository's HEAD, not the virtual HEAD.
And similarly, git checkout with a ref-prefix set would construct a
repository where HEAD doesn't match the workdir.

Having this handled "transparently" for all git commands seems likely to
run into this kind of corner case, where parts of a git command run
correctly with ref-prefix but other parts or other invoked git commands
must not run with ref-prefix.

I do agree that some other git programs could learn to use ref-prefix,
and it makes sense to move the functionality into refs.c as a general
mechanism for those programs to use.  However, I don't think it makes
sense to transparently make all git programs use ref-prefix without
checking them individually to see if it makes sense.

> > +The --ref-prefix and --head options provide quite a bit of flexibility
> > +in organizing the refs of virtual repositories within those of the
> > +underlying repository.  In the absence of a strong reason to do
> > +otherwise, consider following these conventions:
> > +
> > +--ref-prefix=`virtual/reponame/`::
> > +	This puts refs under `refs/virtual/reponame/`, which avoids a
> > +	namespace conflict between `reponame` and built-in ref
> > +	directories such as `heads` and `tags`.
> > +
> > +--head=`virtual-HEAD/reponame`::
> > +	This puts HEADs under `virtual-HEAD/` to avoid namespace
> > +	conflicts with top-level filenames in a git repository.
> 
> I'm curious if you have a use for this much flexibility. In particular,
> why do the HEAD and refs prefixes need the ability to be separate? Also,
> what about other non-HEAD top-level refs? IOW, a true "virtual
> repository" to me would just be:
> 
>   GIT_REF_PREFIX=refs/virtual/repo1
> 
> and then _every_ ref resolution would just prefix that, whether it was
> in refs/ or not. So you would have:
> 
>   .git/refs/virtual/repo1/HEAD
>   .git/refs/virtual/repo1/refs/heads/master
>   .git/refs/virtual/repo1/refs/tags/v1.0

Ah, *now* I see what you meant by including the repeated "refs/", and
using that to allow putting HEAD in the same namespace makes sense.

We don't actually need the flexibility of putting HEAD in a different
place, and this layout makes sense, so we can change the ref-prefix
mechanism to drop the separate --head entirely.

> > +SECURITY
> > +--------
> > +
> > +Anyone with access to any virtual repository can potentially access
> > +objects from any other virtual repository stored in the same underlying
> > +repository.  You can't directly say "give me object ABCD" if you don't
> > +have a ref to it, but you can do some other sneaky things like:
> > +
> > +. Claiming to push ABCD, at which point the server will optimize out the
> > +  need for you to actually send it. Now you have a ref to ABCD and can
> > +  fetch it (claiming not to have it, of course).
> > +
> > +. Requesting other refs, claiming that you have ABCD, at which point the
> > +  server may generate deltas against ABCD.
> > +
> > +None of this causes a problem if you only host public repositories, or
> > +if everyone who may read one virtual repo may also read everything in
> > +every other virtual repo (for instance, if everyone in an organization
> > +has read permission to every repository).
> 
> Well, this text is obviously correct and written by a very smart person.
> ;)
> 
> You might want to mention that if you do need to handle these security
> concerns, then the alternates route, even though it creates more
> management headache, is going to be more flexible with respect to which
> objects are shared.
> 
> In fact, given what I said at the very top of the email, I wonder if the
> documentation would be better structured as "here are two methods for
> sharing objects, here are reasons why you might choose one or the other,
> and here is how to use each".

I think it makes sense to reference alternates in the gitvirtual page, but
I don't think it makes sense to put the full documentation for both in
the same page.

- Josh Triplett

  parent reply	other threads:[~2011-05-25 18:04 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-25  0:46 [PATCH v3 1/3] Support multiple virtual repositories with a single object store and refs Jamey Sharp
2011-05-25  0:46 ` [PATCH v3 2/3] Support virtual repositories in smart http-backend, specified by environment Jamey Sharp
2011-05-25  0:46 ` [PATCH v3 3/3] Add documentation for virtual repositories Jamey Sharp
2011-05-25 16:07   ` Jeff King
2011-05-25 17:01     ` Shawn Pearce
2011-05-25 17:10       ` Jeff King
2011-05-26 18:28         ` Shawn Pearce
2011-05-25 17:20     ` Junio C Hamano
2011-05-25 18:03     ` Josh Triplett [this message]
2011-05-25  1:21 ` [PATCH v3 1/3] Support multiple virtual repositories with a single object store and refs Junio C Hamano
2011-05-25 16:08   ` Jamey Sharp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110525180341.GA2324@leaf \
    --to=josh@joshtriplett.org \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jamey@minilop.net \
    --cc=johannes.sixt@telecom.at \
    --cc=peff@peff.net \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).