git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Turner <dturner@twopensource.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Michael Haggerty <mhagger@alum.mit.edu>,
	git@vger.kernel.org, chriscool@tuxfamily.org, pclouds@gmail.com
Subject: Re: [PATCH v3 2/4] path: optimize common dir checking
Date: Fri, 14 Aug 2015 16:04:04 -0400	[thread overview]
Message-ID: <1439582644.8855.89.camel@twopensource.com> (raw)
In-Reply-To: <xmqqtws1iyxn.fsf@gitster.dls.corp.google.com>

On Fri, 2015-08-14 at 10:04 -0700, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
> > Let's take a step back.
> >
> > We have always had a ton of code that uses `git_path()` and friends to
> > convert abstract things into filesystem paths. Let's take the
> > reference-handling code as an example:
> > ...
> > This seems crazy to me. It is the *reference* code that should know
> > whether a particular reference should be stored under `$GIT_DIR` or
> > `$GIT_COMMON_DIR`, or indeed whether it should be stored in a database.
> 
> It is more like:
> 
>  1. The system as a whole should decide if HEAD and refs/heads/
>     should be per workspace or shared across a repository (and we say
>     the former should be per workspace, the latter should be shared).
> 
>  2. The reference code should decide which ref-backend is used to
>     store refs.
> 
>  3. And any ref-backend should follow the decision made by the
>     system as a whole in 1.
> 
> I'd imagine that David's ref-backend code inherited from Ronnie
> would still accept the string "refs/heads/master" from the rest of
> the system (i.e. callers that call into the ref API) to mean "the
> ref that represents the 'master' branch", and uses that as the key
> to decide "ok, that is shared across workspaces" to honor the
> system-wide decision made in 1.  The outside callers wouldn't pass
> the result of calling git_path("refs/heads/master") into the ref
> API, which may expand to "$somewhere_else/refs/heads/master" when
> run in a secondary workspace to point at the common location.
> 
> I'd also imagine that the workspace API would give ways for the
> implementation of the reference API to ask these questions:
> 
>  - which workspace am I operating for?  where is the "common" thing?
>    how would I identify this workspace among the ones that share the
>    same "common" thing?
> 
>  - is this ref (or ref-like thing) supposed to be in common or per
>    workspace?
> 
> I agree with you that there needs an intermediate level, between
> what Duy and David's git_path() does (i.e. which gives the final
> result of deciding where in the filesystem the thing should go) and
> your two stupid functions would do (i.e. knowing which kind the
> thing is, give you the final location in the filesystem).  That is,
> to let the caller know if the thing is supposed to be shared or per
> workspace in the first place.
> 
> With that intermediate level function, a database-based ref-backend
> could make ('common', ref/heads/master) as the key for the current
> value of the master branch and (workspace-name, HEAD) as the key for
> the current value of the HEAD for a given workspace.
> 
> > We should have two *stupid* functions, `git_workspace_path()` and
> > `git_common_path()`, and have the *callers* decide which one to call.
> 
> So I think we should have *three* functions:
> 
>  - git_workspace_name(void) returns some name that uniquely
>    identifies the current workspace among the workspaces linked to
>    the same repository.

Random side note: the present workspace path name component is not
acceptable for this if alternate ref backends use a single db for
storage across all workspaces.  That's because you might create a
workspace at foo, then manually rm -r it, and then create a new one also
named foo.  The database wouldn't know about this series of events, and
would then have stale per-workspace refs for foo.

That said, with my lmdb backend, I've been falling back to the files
backend for per-workspace refs.  This also means I don't have to worry
about expiring per-workspace refs when a workspace is removed. 

I could change this, but IIRC, there are a fair number of things that
care about the existence of a file called HEAD, so the fallback was
easier.  (That is, the other way was a giant hassle).

>  - is_common_thing(const char *) takes a path (that is relative to
>    $GIT_DIR where the thing would have lived at in a pre-workspace
>    world), and tells if it is a common thing or a per-workspace
>    thing.
> 
>  - git_path() can stay the external interface and can be thought of:
> 
> 	git_path(const char *path)
>         {
> 		if (is_common_thing(path))
> 			return git_common_path(path);
> 		return git_workspace_path(git_workspace_name(), path);
> 	}
> 
>    if you think in terms of your two helpers.
> 
> But I do not think that git_common_path() and git_workspace_path()
> need to be called from any other place in the system, and that is
> the reason why I did not say we should have four functions (or five,
> counting git_path() itself).

I wrote an email arguing for Michael's position on this, and by the time
I was done writing it, I had come around to more-or-less Junio's
position.

My argument for keeping git_path as the external interface is this: in
the multiple worktree world, $GIT_DIR is effectively an overlay
filesystem.  It's not a complete overlay API: e.g. we don't have a
git_path_opendir function that special-cases refs/ to add in
refs/worktree.  But it handles the common cases.

It is true that even if we had a complete overlay API, it would not be
sufficient to hide all of the complexity of per-worktree refs (the files
ref backend still needs to know not to pack per-worktree refs).  But
that is equally true if we have the refs code call
git_common_path/git_worktree_path.  So we're not successfully hiding
details by using git_common_path/git_worktree_path in refs.c.

For this patch series, I don't think we need to change anything (except
that I realized that I forgot to add logs/refs/worktree to refs, and
people will probably find some issues once they start reviewing the
details of my code). In the present code, adjust_git_path already
handles the git_common_path vs git_workspace_path thing; it just does it
in a slightly less elegant way than Junio's proposal.  Implementing
Junio's proposal would not affect this series; it would just be an
additional patch on top (or beforehand).

There is one case where the refs code will probably need to directly
call git_workspace_path: when we fix git prune to handle detached HEADs
in workspaces.  It could just set the workspace and then call git_path,
but that is less elegant.  So I think when we fix that (which should
probably wait on for_each_worktree), we can implement Junio's proposal.

  reply	other threads:[~2015-08-14 20:04 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-12 21:57 [PATCH v3 1/4] refs: clean up common_list David Turner
2015-08-12 21:57 ` [PATCH v3 2/4] path: optimize common dir checking David Turner
2015-08-12 22:48   ` Junio C Hamano
2015-08-13  9:05   ` Michael Haggerty
2015-08-14 17:04     ` Junio C Hamano
2015-08-14 20:04       ` David Turner [this message]
2015-08-14 20:27         ` Junio C Hamano
2015-08-14 20:54           ` David Turner
2015-08-15 18:20         ` Michael Haggerty
2015-08-15 18:12       ` Michael Haggerty
2015-08-17 15:55         ` Junio C Hamano
2015-08-15  7:59   ` Duy Nguyen
2015-08-16  5:04     ` David Turner
2015-08-16 12:20       ` Duy Nguyen
2015-08-12 21:57 ` [PATCH v3 3/4] refs: make refs/worktree/* per-worktree David Turner
2015-08-13 17:15   ` Eric Sunshine
2015-08-13 17:41     ` David Turner
2015-08-13 20:16       ` Michael Haggerty
2015-08-13 20:32         ` David Turner
2015-08-14  8:18           ` Michael Haggerty
2015-08-14 17:10             ` Junio C Hamano
2015-08-15  8:04   ` Duy Nguyen
2015-08-12 21:57 ` [PATCH v3 4/4] bisect: make bisection refs per-worktree David Turner
2015-08-15  7:44 ` [PATCH v3 1/4] refs: clean up common_list Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1439582644.8855.89.camel@twopensource.com \
    --to=dturner@twopensource.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).