git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Turner <dturner@twopensource.com>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>,
	Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [PATCH/RFC 0/2] bisect per-worktree
Date: Mon, 03 Aug 2015 15:49:56 -0400	[thread overview]
Message-ID: <1438631396.7348.33.camel@twopensource.com> (raw)
In-Reply-To: <55BC6C5C.1070707@alum.mit.edu>

On Sat, 2015-08-01 at 08:51 +0200, Michael Haggerty wrote:
> On 08/01/2015 07:12 AM, Junio C Hamano wrote:
> > On Fri, Jul 31, 2015 at 8:59 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> >>
> >> It seems to me that adding a new top-level "worktree-refs" directory is
> >> pretty traumatic. Lots of people and tools will have made the assumption
> >> that all "normal" references live under "refs/".
> >> ...
> >> It's all a bit frightening, frankly.
> > 
> > I actually feel the prospect of pluggable ref backend more frightening,
> > frankly ;-). These bisect refs are just like FETCH_HEAD and MERGE_HEAD,
> > not about the primary purpose of the "repository" to grow the history of refs
> > (branches), but about ephemeral pointers into the history used to help keep
> > track of what is being done in the worktree upstairs. There is no need for
> > these to be visible across worktrees. If we use the real refs that are grobal
> > in the repository (as opposed to per-worktree ones), we would hit the backend
> > databas with transactions to update these ephemeral things, which somehow
> > makes me feel stupid.
> 
> Hmm, ok, so you are thinking of a remote database with high latency. I
> was thinking more of something like LMDB, with latency comparable to
> filesystem storage.
> 
> These worktree-specific references might be ephemeral, but they also
> imply reachability, which means that they need to be visible at least
> during object pruning. Moreover, if the references don't live in the
> same database with the rest of the references, then we have to deal with
> races due to updating references in different places without atomicity.
> 
> The refs+object store is the most important thing for maintaining the
> integrity of a repo and avoiding races. To me it seems easier to do so
> if there is a single refs+objects store than if we have some references
> over here on the file system, some over there in a LMDB, etc. So my gut
> feeling is for the primary reference storage to be in a single reference
> namespace that (at least in principle) can be stored in a single ACID
> database.
>
> For each worktree, we could then create a different view of the
> references by splicing parts of the full reference namespace together.
> This could even be based on config settings so that we don't have to
> hardcode information like "refs/bisect/* is worktree-specific" deep in
> the references module. Suppose we could write
> 
> [worktree.refs]
> 	map = refs/worktrees/*:
> 	map = refs/bisect/*:refs/worktrees/[worktree]/refs/bisect/*
> 
> which would mean (a) hide the references under refs/worktrees", and (b)
> make it look as if the references under
> refs/worktrees/[worktree]/refs/bisect actually appear under refs/bisect
> (where "[worktree]" is replaced with the current worktree's name). By
> making these settings configurable, we allow other projects to define
> their own worktree-specific reference namespaces too.
> 
> The corresponding main repo might hide "refs/worktrees/*" but leave its
> refs/bisect namespace exposed in the usual place.
> 
> "git prune" would see the whole namespace as it really is so that it can
> compute reachability correctly.

I think making this configurable is (a) overkill and (b) dangerous.
It's dangerous because the semantics of which refs are per-worktree is
important to the correct operation of git, and allowing users to mess
with it seems like a big mistake.  Instead, we should figure out a
simple scheme and define it globally.

I think refs/worktree -> refs/worktrees/[worktree]/ would do fine as a
fixed scheme, if we go that route.

We would need two separate views of the refs hierarchy, though: one used
by prune (and pack-refs) that is non-mapped (that is, includes
per-worktree refs for each worktree), and one for general use that is
mapped.   Maybe this is just a flag to the ref traversal functions.

But I'm not sure that this is really the right way to go.

As I understand it, we don't presently do many transactions that include
both pseudorefs or per-worktree refs and other refs.  And we definitely
don't want to move pseudorefs into the database since there's so much
code that assumes they're files.  Also, the vast majority of refs are
common, rather than per-worktree.  In fact, the only per-worktree refs
I've seen mentioned so far are the bisect refs and NOTES_MERGE_REF and
HEAD.  Of these, only HEAD is needed for pruning. Are there more that I
haven't thought of?

So I'm not sure the gain from moving per-worktree refs into the database
is that great.

There are some downsides of moving per-worktree refs into the database:

1. More operations in one worktree can now contend with operations in
another worktree for the database.  LMDB only allows a single write
transaction at a time.  

2. The refs API would be more complicated: it would need to deal with
remapped vs raw ref paths.  Refs backends would need to have functions
to prune per-worktree data when a worktree is destroyed. 

4. We would still need to deal with pseudorefs, so there's still some
missing transactional safety, and still the complication of dealing with
files on the filesystem.

Simply treating refs/worktree as per-worktree, while the rest of refs/
is not, would be a few dozen lines of code.  The full remapping approach
is likely to be a lot more. I've already got the lmdb backend working
with something like this approach.  If we decide on a complicated
approach, I am likely to run out of time to work on pluggable backends.

  parent reply	other threads:[~2015-08-03 19:50 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-31 23:56 [PATCH/RFC 0/2] bisect per-worktree David Turner
2015-07-31 23:56 ` [PATCH 1/2] refs: workree-refs/* become per-worktree David Turner
2015-07-31 23:56 ` [PATCH 2/2] bisect: make bisection refs per-worktree David Turner
2015-08-01  3:59 ` [PATCH/RFC 0/2] bisect per-worktree Michael Haggerty
2015-08-01  5:12   ` Junio C Hamano
2015-08-01  5:55     ` David Turner
2015-08-01  6:51     ` Michael Haggerty
2015-08-02 18:24       ` Junio C Hamano
2015-08-03 12:35       ` Duy Nguyen
2015-08-03 19:49       ` David Turner [this message]
2015-08-03 21:14         ` Junio C Hamano
2015-08-03 23:09         ` Duy Nguyen
2015-08-03 23:20           ` David Turner
2015-08-03 13:02   ` Duy Nguyen
2015-08-03 14:03     ` Duy Nguyen
     [not found] <CAP8UFD0aCSW3JxneHvSEE3T6zQtgipp5nhWT9VpMqHAmzd_e3Q@mail.gmail.com>
2015-08-01  5:43 ` David Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1438631396.7348.33.camel@twopensource.com \
    --to=dturner@twopensource.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).