From: Jeff King <peff@peff.net>
To: David Turner <dturner@twopensource.com>
Cc: git@vger.kernel.org, mhagger@alum.mit.edu, pclouds@gmail.com
Subject: Re: [PATCH v7 01/33] setup: call setup_git_directory_gently before accessing refs
Date: Tue, 1 Mar 2016 03:35:35 -0500 [thread overview]
Message-ID: <20160301083535.GA4952@sigill.intra.peff.net> (raw)
In-Reply-To: <1456793586-22082-2-git-send-email-dturner@twopensource.com>
On Mon, Feb 29, 2016 at 07:52:34PM -0500, David Turner wrote:
> Usually, git calls some form of setup_git_directory at startup. But
> sometimes, it doesn't. Usually, that's OK because it's not really
> using the repository. But in some cases, it is using the repo. In
> those cases, either setup_git_directory_gently must be called, or the
> repository (e.g. the refs) must not be accessed.
It's actually not just setup_git_directory(). We can also use
check_repository_format(), which is used by enter_repo() (and hence by
things like upload-pack). I think the rule really ought to be: if we
didn't have check_repository_format_gently() tell us we have a valid
repo, we should not access any repo elements (refs, objects, etc).
I started earlier today on a patch series to identify and fix these
cases independent of your series. The basic strategy was to adapt the
existing "struct startup_info" to be available everywhere, and have
relevant bits of code assert() on it, or even behave differently (e.g.,
if some library code should do different things in a repo versus not).
But I think we can probably just scrap the assert() part of that. The
assertions I put in were unsurprisingly at the entry points to the ref
code. And your series supersedes that; we can't do anything with the
refs until the ref backend is setup, and if we only do so in
check_repository_format_gently(), then it amounts to the same thing.
For the "behave differently" part, I needed it for the .mailmap case,
but you fixed it below without having to add that.
I think it's worth going through the changes here and comparing notes
with what my series would have done.
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 9e3f1cf..1e36b52 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -531,6 +531,7 @@ static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec,
> if (exc_std)
> setup_standard_excludes(&dir);
>
> + dir.flags |= DIR_NO_GITLINKS;
> fill_directory(&dir, pathspec);
> for (i = 0; i < dir.nr; i++) {
> if (!dir_path_match(dir.entries[i], pathspec, 0, NULL))
This one is interesting, because the ref access in fill_directory() is
only for hitting submodule refs. In theory, I guess a command operating
in a non-repo could want to know about and do something with embedded
git repos.
And indeed, it does produce a behavior change here. With a repo like:
mkdir non-repo && cd non-repo &&
git init sub &&
(cd sub && echo foo >file && git add . && git commit -m foo)
running:
git grep --no-index foo
does not currently find sub/file (because it does not descend into what
it think is a sub-repository), but it _does_ with your patch. I'm
inclined to say that's actually a behavior improvement. "grep
--no-index" on a directory is about behaving as a recursive grep, and
should probably descend into sub-repos (it probably should also avoid
looking inside .git directories, though, and I think it still does, even
with your patch).
The fill_directory() also touches the_index, which it should not in a
non-repository. But I think that's probably OK, because we simply don't
read the index in the first place (so it behaves naturally as if the
index is empty).
> diff --git a/builtin/log.c b/builtin/log.c
> index 0d738d6..1d0e43e 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -975,7 +975,7 @@ static void make_cover_letter(struct rev_info *rev, int use_stdout,
>
> strbuf_release(&sb);
>
> - shortlog_init(&log);
> + shortlog_init(&log, 0);
> log.wrap_lines = 1;
> log.wrap = 72;
> log.in1 = 2;
This looks right. If we are making a cover letter for format-patch, we
know we have a repo, and thus nongit is always 0. Though I admit the
double-negating confused me for a minute. I don't know if there's a way
around it, though, because "nongit" is what comes out of
setup_git_directory().
> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index bfc082e..ab4305b 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -219,11 +219,12 @@ static int parse_wrap_args(const struct option *opt, const char *arg, int unset)
> return 0;
> }
>
> -void shortlog_init(struct shortlog *log)
> +void shortlog_init(struct shortlog *log, int nongit)
> {
> memset(log, 0, sizeof(*log));
>
> - read_mailmap(&log->mailmap, &log->common_repo_prefix);
> + if (!nongit)
> + read_mailmap(&log->mailmap, &log->common_repo_prefix);
My fix for this was to teach read_mailmap to avoid looking for
HEAD:.mailmap if we are not in a repository, but to continue with the
others (.mailmap in the cwd, and the mailmap.file config variable).
Yours disables the .mailmap entirely. That makes some sense for looking
at ".mailmap" in the working tree; if we do not have a repository, we
should not look at a mailmap (though I guess you could argue the
opposite, that a .mailmap in the current directory of a non-repo is
worth looking at). But I'd think the mailmap.file config would apply
even to shortlog invoked outside a repository.
To be perfectly honest, I cannot imagine that shortlog is invoked with
data on stdin much at all these days, let alone outside of a repository.
But I do think your patch is a potential regression there, if anybody
does do that.
> diff --git a/git.c b/git.c
> index 6cc0c07..51e0508 100644
> --- a/git.c
> +++ b/git.c
> @@ -376,7 +376,7 @@ static struct cmd_struct commands[] = {
> { "am", cmd_am, RUN_SETUP | NEED_WORK_TREE },
> { "annotate", cmd_annotate, RUN_SETUP },
> { "apply", cmd_apply, RUN_SETUP_GENTLY },
> - { "archive", cmd_archive },
> + { "archive", cmd_archive, RUN_SETUP_GENTLY },
> { "bisect--helper", cmd_bisect__helper, RUN_SETUP },
> { "blame", cmd_blame, RUN_SETUP },
> { "branch", cmd_branch, RUN_SETUP },
I didn't have to touch this case in my experimenting. I wonder if it's
because I resolved the "grep" case a little differently.
I taught get_ref_cache() to only assert() that we have a repository when
we are looking at the main ref-cache, not a submodule. In theory, we can
look at a submodule from inside an outer non-repo (it's not really a
submodule then, but just a plain git dir). I don't think there's
anything in git right now that says you can't do so, though I think your
refs-backend work does introduce that restriction (because it actually
requires the submodules to use the same backend).
So with that requirement, I think we do need to require a repo even to
access submodule refs. Is that what triggered this change?
I'd think you would need a matching line inside cmd_archive, too. It
should allow "--remote" without a repo, but generating a local archive
does need one. And indeed, I see in write_archive() that we run
setup_git_repository ourselves, and die if we're not in a git repo. So
I'm puzzled about which code path accesses the refs.
> diff --git a/test-match-trees.c b/test-match-trees.c
> index 109f03e..4dad709 100644
> --- a/test-match-trees.c
> +++ b/test-match-trees.c
> @@ -6,6 +6,8 @@ int main(int ac, char **av)
> unsigned char hash1[20], hash2[20], shifted[20];
> struct tree *one, *two;
>
> + setup_git_directory();
> +
> if (get_sha1(av[1], hash1))
> die("cannot parse %s as an object name", av[1]);
> if (get_sha1(av[2], hash2))
This one is weird. The test-match-trees program is only used one time in
our test suite, and then it is only as a hack because it is an external
that does not have startup_info setup. I think that test is somewhat
bogus (and is obsoleted by my approach), and we could probably get rid
of this program entirely.
But your patch here is certainly the right thing to do if we are keeping
it.
> diff --git a/test-revision-walking.c b/test-revision-walking.c
> index 285f06b..3d03133 100644
> --- a/test-revision-walking.c
> +++ b/test-revision-walking.c
> @@ -50,6 +50,8 @@ int main(int argc, char **argv)
> if (argc < 2)
> return 1;
>
> + setup_git_directory();
> +
> if (!strcmp(argv[1], "run-twice")) {
> printf("1st\n");
> if (!run_revision_walk())
This one I solved in the same way. Yay, we agreed on one! :)
-Peff
next prev parent reply other threads:[~2016-03-01 8:35 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-01 0:52 [PATCH v7 00/33] refs backend David Turner
2016-03-01 0:52 ` [PATCH v7 01/33] setup: call setup_git_directory_gently before accessing refs David Turner
2016-03-01 8:35 ` Jeff King [this message]
2016-03-01 23:47 ` David Turner
2016-03-02 0:33 ` David Turner
2016-03-02 2:45 ` Jeff King
2016-03-01 0:52 ` [PATCH v7 02/33] refs: move head_ref{,_submodule} to the common code David Turner
2016-03-01 0:52 ` [PATCH v7 03/33] refs: move for_each_*ref* functions into " David Turner
2016-03-01 0:52 ` [PATCH v7 04/33] files-backend: break out ref reading David Turner
2016-03-20 5:03 ` Michael Haggerty
2016-03-22 8:33 ` Michael Haggerty
2016-03-23 10:19 ` Michael Haggerty
2016-03-01 0:52 ` [PATCH v7 05/33] refs: move resolve_ref_unsafe into common code David Turner
2016-03-01 0:52 ` [PATCH v7 06/33] refs: add a backend method structure with transaction functions David Turner
2016-03-01 0:52 ` [PATCH v7 07/33] refs: add methods for misc ref operations David Turner
2016-03-01 0:52 ` [PATCH v7 08/33] refs: add method for do_for_each_ref David Turner
2016-03-01 0:52 ` [PATCH v7 09/33] refs: reduce the visibility of do_for_each_ref() David Turner
2016-03-24 7:07 ` Michael Haggerty
2016-03-24 18:56 ` David Turner
2016-03-01 0:52 ` [PATCH v7 10/33] refs: add do_for_each_per_worktree_ref David Turner
2016-03-01 0:52 ` [PATCH v7 11/33] refs: add methods for reflog David Turner
2016-03-01 0:52 ` [PATCH v7 12/33] refs: add method for initial ref transaction commit David Turner
2016-03-01 0:52 ` [PATCH v7 13/33] refs: add method for delete_refs David Turner
2016-03-01 0:52 ` [PATCH v7 14/33] refs: add methods to init refs db David Turner
2016-03-24 7:28 ` Michael Haggerty
2016-03-24 18:04 ` David Turner
2016-03-01 0:52 ` [PATCH v7 15/33] refs: add method to rename refs David Turner
2016-03-01 0:52 ` [PATCH v7 16/33] refs: handle non-normal ref renames David Turner
2016-03-01 0:52 ` [PATCH v7 17/33] refs: make lock generic David Turner
2016-03-24 19:45 ` Michael Haggerty
2016-03-01 0:52 ` [PATCH v7 18/33] refs: move duplicate check to common code David Turner
2016-03-01 0:52 ` [PATCH v7 19/33] refs: allow log-only updates David Turner
2016-04-21 14:17 ` Michael Haggerty
2016-04-25 16:46 ` David Turner
2016-03-01 0:52 ` [PATCH v7 20/33] refs: don't dereference on rename David Turner
2016-03-01 0:52 ` [PATCH v7 21/33] refs: on symref reflog expire, lock symref not referrent David Turner
2016-03-01 0:52 ` [PATCH v7 22/33] refs: resolve symbolic refs first David Turner
2016-03-01 0:52 ` [PATCH v7 23/33] refs: always handle non-normal refs in files backend David Turner
2016-03-01 0:52 ` [PATCH v7 24/33] init: allow alternate ref strorage to be set for new repos David Turner
2016-03-01 0:52 ` [PATCH v7 25/33] refs: check submodules' ref storage config David Turner
2016-03-01 0:52 ` [PATCH v7 26/33] clone: allow ref storage backend to be set for clone David Turner
2016-03-01 0:53 ` [PATCH v7 27/33] svn: learn ref-storage argument David Turner
2016-03-01 0:53 ` [PATCH v7 28/33] refs: register ref storage backends David Turner
2016-03-01 0:53 ` [PATCH v7 29/33] setup: configure ref storage on setup David Turner
2016-03-01 8:48 ` Jeff King
2016-03-01 14:50 ` Jeff King
2016-03-01 17:18 ` Ramsay Jones
2016-03-01 19:16 ` David Turner
2016-03-01 0:53 ` [PATCH v7 30/33] refs: break out resolve_ref_unsafe_submodule David Turner
2016-03-01 17:21 ` Ramsay Jones
2016-03-01 19:17 ` David Turner
2016-03-01 0:53 ` [PATCH v7 31/33] refs: add LMDB refs storage backend David Turner
2016-03-01 1:31 ` Duy Nguyen
2016-03-01 1:35 ` David Turner
2016-03-01 1:45 ` Duy Nguyen
2016-03-01 0:53 ` [PATCH v7 32/33] refs: tests for lmdb backend David Turner
2016-03-01 0:53 ` [PATCH v7 33/33] tests: add ref-storage argument David Turner
-- strict thread matches above, loose matches on Subject: below --
2016-03-01 9:53 [PATCH v7 01/33] setup: call setup_git_directory_gently before accessing refs Duy Nguyen
2016-03-01 9:55 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160301083535.GA4952@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=dturner@twopensource.com \
--cc=git@vger.kernel.org \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).