From: Michael Haggerty <mhagger@alum.mit.edu>
To: David Turner <dturner@twopensource.com>,
git@vger.kernel.org, chriscool@tuxfamily.org, pclouds@gmail.com
Subject: Re: [PATCH v3 2/4] path: optimize common dir checking
Date: Thu, 13 Aug 2015 11:05:49 +0200 [thread overview]
Message-ID: <55CC5DED.5050304@alum.mit.edu> (raw)
In-Reply-To: <1439416645-19173-2-git-send-email-dturner@twopensource.com>
On 08/12/2015 11:57 PM, David Turner wrote:
> Instead of a linear search over common_list to check whether
> a path is common, use a trie. The trie search operates on
> path prefixes, and handles excludes.
>
> Signed-off-by: David Turner <dturner@twopensource.com>
> ---
>
> Probably overkill, but maybe we could later use it for making exclude
> or sparse-checkout matching faster (or maybe we have to go all the way
> to McNaughton-Yamada for that to be truly worthwhile).
Let's take a step back.
We have always had a ton of code that uses `git_path()` and friends to
convert abstract things into filesystem paths. Let's take the
reference-handling code as an example:
`git_path("refs/heads/master")` returns something like
".git/refs/heads/master", which happens to be the place where we would
store a loose reference with that name. But in reality,
"refs/heads/master" is a reference name, not a fragment of a path. It's
just that the reference code knows that the transformation done by
`git_path()` *accidentally* happens to convert a reference name into the
name of the path of the corresponding loose reference file.
In fact, the reference code is even smarter than that. It knows that
within submodules, `git_path()` does *not* do the right mapping. In
those cases it calls `git_path_submodule()` instead.
But now we have workspaces, and things have become more complicated.
Some references are stored in `$GIT_DIR`, while others are stored in
`$GIT_COMMON_DIR`. Who should know all of these details?
The current answer is that the reference-handling code remains (mostly)
ignorant of workspaces. It just stupidly calls `git_path()` (or
`git_path_submodule()`) regardless of the reference name. It is
`git_path()` that has grown the global insight to know which files are
now stored in `$GIT_COMMON_DIR` vs `$GIT_DIR`. Now it helpfully
transforms "refs/heads/master" into "$GIT_COMMON_DIR/refs/heads/master"
but transforms "refs/worktree/foo" into "$GIT_DIR/refs/worktree/foo". It
has developed similar insight into lots of other file types. IT KNOWS
TOO MUCH. And because of that, it become a lot more complicated and
might even be a performance problem.
This seems crazy to me. It is the *reference* code that should know
whether a particular reference should be stored under `$GIT_DIR` or
`$GIT_COMMON_DIR`, or indeed whether it should be stored in a database.
We should have two *stupid* functions, `git_workspace_path()` and
`git_common_path()`, and have the *callers* decide which one to call.
The only reason to retain a knows-everything `git_path()` function is as
a crutch for 3rd-party applications that think they are clever enough to
grub around in `$GIT_DIR` at the filesystem level. But that should be
highly discouraged, and we should make it our mission to provide
commands that make it unnecessary.
Michael
--
Michael Haggerty
mhagger@alum.mit.edu
next prev parent reply other threads:[~2015-08-13 9:06 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-12 21:57 [PATCH v3 1/4] refs: clean up common_list David Turner
2015-08-12 21:57 ` [PATCH v3 2/4] path: optimize common dir checking David Turner
2015-08-12 22:48 ` Junio C Hamano
2015-08-13 9:05 ` Michael Haggerty [this message]
2015-08-14 17:04 ` Junio C Hamano
2015-08-14 20:04 ` David Turner
2015-08-14 20:27 ` Junio C Hamano
2015-08-14 20:54 ` David Turner
2015-08-15 18:20 ` Michael Haggerty
2015-08-15 18:12 ` Michael Haggerty
2015-08-17 15:55 ` Junio C Hamano
2015-08-15 7:59 ` Duy Nguyen
2015-08-16 5:04 ` David Turner
2015-08-16 12:20 ` Duy Nguyen
2015-08-12 21:57 ` [PATCH v3 3/4] refs: make refs/worktree/* per-worktree David Turner
2015-08-13 17:15 ` Eric Sunshine
2015-08-13 17:41 ` David Turner
2015-08-13 20:16 ` Michael Haggerty
2015-08-13 20:32 ` David Turner
2015-08-14 8:18 ` Michael Haggerty
2015-08-14 17:10 ` Junio C Hamano
2015-08-15 8:04 ` Duy Nguyen
2015-08-12 21:57 ` [PATCH v3 4/4] bisect: make bisection refs per-worktree David Turner
2015-08-15 7:44 ` [PATCH v3 1/4] refs: clean up common_list Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55CC5DED.5050304@alum.mit.edu \
--to=mhagger@alum.mit.edu \
--cc=chriscool@tuxfamily.org \
--cc=dturner@twopensource.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).