* Re: [PATCH v7 01/33] setup: call setup_git_directory_gently before accessing refs @ 2016-03-01 9:53 Duy Nguyen 2016-03-01 9:55 ` Jeff King 0 siblings, 1 reply; 7+ messages in thread From: Duy Nguyen @ 2016-03-01 9:53 UTC (permalink / raw) To: Jeff King; +Cc: David Turner, Git Mailing List, Michael Haggerty On Tue, Mar 1, 2016 at 3:35 PM, Jeff King <peff@peff.net> wrote: > On Mon, Feb 29, 2016 at 07:52:34PM -0500, David Turner wrote: > >> Usually, git calls some form of setup_git_directory at startup. But >> sometimes, it doesn't. Usually, that's OK because it's not really >> using the repository. But in some cases, it is using the repo. In >> those cases, either setup_git_directory_gently must be called, or the >> repository (e.g. the refs) must not be accessed. > > It's actually not just setup_git_directory(). We can also use > check_repository_format(), which is used by enter_repo() (and hence by > things like upload-pack). I think the rule really ought to be: if we > didn't have check_repository_format_gently() tell us we have a valid > repo, we should not access any repo elements (refs, objects, etc). Agreed. There's also a lighter version of check_repo.. which is is_git_directory(). Most of the time we just want to answer the question "is it a valid repository? support or not does not matter". We probably need more eyes on submodule case when this functino is used. For example in 25/33 [1] we check if a repo is non-bare (a variant of is_git_directory) then we peek the config file inside. Should check_repository_format() be done in this case? You know what, forget my question. The answer is yes. After writing all that, I remember that part of the config file may be moved away in the next version of multiple worktrees [2]. We need proper repo validation before reading anything inside. [1] http://article.gmane.org/gmane.comp.version-control.git/287959 [2] http://article.gmane.org/gmane.comp.version-control.git/284803 > I started earlier today on a patch series to identify and fix these > cases independent of your series. Yes this sounds like a separate problem, even though it's raised by lmdb topic. > The basic strategy was to adapt the > existing "struct startup_info" to be available everywhere, and have > relevant bits of code assert() on it, or even behave differently (e.g., > if some library code should do different things in a repo versus not). startup_info is NULL for external programs if I remember correctly, or do you make it available to all of them too? -- Duy ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v7 01/33] setup: call setup_git_directory_gently before accessing refs 2016-03-01 9:53 [PATCH v7 01/33] setup: call setup_git_directory_gently before accessing refs Duy Nguyen @ 2016-03-01 9:55 ` Jeff King 0 siblings, 0 replies; 7+ messages in thread From: Jeff King @ 2016-03-01 9:55 UTC (permalink / raw) To: Duy Nguyen; +Cc: David Turner, Git Mailing List, Michael Haggerty On Tue, Mar 01, 2016 at 04:53:30PM +0700, Duy Nguyen wrote: > > The basic strategy was to adapt the > > existing "struct startup_info" to be available everywhere, and have > > relevant bits of code assert() on it, or even behave differently (e.g., > > if some library code should do different things in a repo versus not). > > startup_info is NULL for external programs if I remember correctly, or > do you make it available to all of them too? Yes, that was what I meant by "available everywhere". Library code cannot rely on it right now, as only builtins set it up (even though external programs may call setup_git_directory()). -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v7 00/33] refs backend @ 2016-03-01 0:52 David Turner 2016-03-01 0:52 ` [PATCH v7 01/33] setup: call setup_git_directory_gently before accessing refs David Turner 0 siblings, 1 reply; 7+ messages in thread From: David Turner @ 2016-03-01 0:52 UTC (permalink / raw) To: git, peff, mhagger, pclouds; +Cc: David Turner This one has suggestions from Peff, SZEDER Gábor, Duy Nguyen, and fixups from Junio. The major changes are: * The new patch "call setup_git_directory_gently before accessing refs" -- this is necessary in order to move "setup: configure ref storage config on startup" from config to setup. * "setup: configure ref storage config on startup" is now much shorter. In addition, there are some minor fixups to remove variable shadowing in the lmdb code and to improve the design of the set_ref_storage_backend family of functions. David Turner (30): setup: call setup_git_directory_gently before accessing refs refs: move head_ref{,_submodule} to the common code refs: move for_each_*ref* functions into common code files-backend: break out ref reading refs: move resolve_ref_unsafe into common code refs: add method for do_for_each_ref refs: add do_for_each_per_worktree_ref refs: add methods for reflog refs: add method for initial ref transaction commit refs: add method for delete_refs refs: add methods to init refs db refs: add method to rename refs refs: handle non-normal ref renames refs: make lock generic refs: move duplicate check to common code refs: allow log-only updates refs: don't dereference on rename refs: on symref reflog expire, lock symref not referrent refs: resolve symbolic refs first refs: always handle non-normal refs in files backend init: allow alternate ref strorage to be set for new repos refs: check submodules' ref storage config clone: allow ref storage backend to be set for clone svn: learn ref-storage argument refs: register ref storage backends setup: configure ref storage config on startup refs: break out resolve_ref_unsafe_submodule refs: add LMDB refs storage backend refs: tests for lmdb backend tests: add ref-storage argument Ramsay Jones (1): refs: reduce the visibility of do_for_each_ref() Ronnie Sahlberg (2): refs: add a backend method structure with transaction functions refs: add methods for misc ref operations .gitignore | 1 + Documentation/config.txt | 9 + Documentation/git-clone.txt | 6 + Documentation/git-init-db.txt | 2 +- Documentation/git-init.txt | 8 +- Documentation/technical/refs-lmdb-backend.txt | 61 + Documentation/technical/repository-version.txt | 7 + Makefile | 12 + builtin/clone.c | 5 + builtin/grep.c | 1 + builtin/init-db.c | 55 +- builtin/log.c | 2 +- builtin/shortlog.c | 7 +- builtin/submodule--helper.c | 2 +- cache.h | 2 + config.c | 1 + configure.ac | 33 + contrib/completion/git-completion.bash | 6 +- contrib/workdir/git-new-workdir | 3 + git-submodule.sh | 13 + git-svn.perl | 6 +- git.c | 2 +- path.c | 30 +- refs.c | 631 +++++++- refs.h | 16 + refs/files-backend.c | 686 ++++----- refs/lmdb-backend.c | 1886 ++++++++++++++++++++++++ refs/refs-internal.h | 123 +- setup.c | 29 + shortlog.h | 2 +- t/README | 6 + t/lib-submodule-update.sh | 15 +- t/lib-t6000.sh | 7 +- t/t0001-init.sh | 25 + t/t0008-ignores.sh | 2 +- t/t0062-revision-walking.sh | 6 + t/t1021-rerere-in-workdir.sh | 6 + t/t1200-tutorial.sh | 8 +- t/t1302-repo-version.sh | 6 + t/t1305-config-include.sh | 17 +- t/t1400-update-ref.sh | 6 + t/t1401-symbolic-ref.sh | 17 +- t/t1404-update-ref-df-conflicts.sh | 8 +- t/t1410-reflog.sh | 16 + t/t1430-bad-ref-name.sh | 6 + t/t1450-fsck.sh | 12 +- t/t1460-refs-lmdb-backend.sh | 1109 ++++++++++++++ t/t1470-refs-lmdb-backend-reflog.sh | 359 +++++ t/t1480-refs-lmdb-submodule.sh | 85 ++ t/t1506-rev-parse-diagnosis.sh | 4 +- t/t2013-checkout-submodule.sh | 2 +- t/t2105-update-index-gitfile.sh | 4 +- t/t2107-update-index-basic.sh | 6 +- t/t2201-add-update-typechange.sh | 4 +- t/t3001-ls-files-others-exclude.sh | 2 +- t/t3010-ls-files-killed-modified.sh | 4 +- t/t3040-subprojects-basic.sh | 4 +- t/t3050-subprojects-fetch.sh | 2 +- t/t3200-branch.sh | 84 +- t/t3210-pack-refs.sh | 7 + t/t3211-peel-ref.sh | 6 + t/t3308-notes-merge.sh | 2 +- t/t3404-rebase-interactive.sh | 2 +- t/t3600-rm.sh | 2 +- t/t3800-mktag.sh | 4 +- t/t3903-stash.sh | 2 +- t/t4010-diff-pathspec.sh | 2 +- t/t4020-diff-external.sh | 2 +- t/t4027-diff-submodule.sh | 2 +- t/t4035-diff-quiet.sh | 2 +- t/t4255-am-submodule.sh | 2 +- t/t5000-tar-tree.sh | 3 +- t/t5304-prune.sh | 2 +- t/t5312-prune-corruption.sh | 11 +- t/t5500-fetch-pack.sh | 10 +- t/t5510-fetch.sh | 30 +- t/t5526-fetch-submodules.sh | 4 +- t/t5527-fetch-odd-refs.sh | 7 + t/t5537-fetch-shallow.sh | 7 + t/t5700-clone-reference.sh | 42 +- t/t6001-rev-list-graft.sh | 3 +- t/t6010-merge-base.sh | 2 +- t/t6050-replace.sh | 4 +- t/t6120-describe.sh | 6 +- t/t6301-for-each-ref-errors.sh | 12 +- t/t7201-co.sh | 2 +- t/t7300-clean.sh | 25 +- t/t7400-submodule-basic.sh | 22 +- t/t7402-submodule-rebase.sh | 2 +- t/t7405-submodule-merge.sh | 10 +- t/t9104-git-svn-follow-parent.sh | 3 +- t/t9115-git-svn-dcommit-funky-renames.sh | 2 +- t/t9350-fast-export.sh | 6 +- t/t9902-completion.sh | 4 +- t/t9903-bash-prompt.sh | 2 +- t/test-lib-functions.sh | 53 +- t/test-lib.sh | 12 + test-match-trees.c | 2 + test-refs-lmdb-backend.c | 66 + test-revision-walking.c | 2 + 100 files changed, 5265 insertions(+), 605 deletions(-) create mode 100644 Documentation/technical/refs-lmdb-backend.txt create mode 100644 refs/lmdb-backend.c create mode 100755 t/t1460-refs-lmdb-backend.sh create mode 100755 t/t1470-refs-lmdb-backend-reflog.sh create mode 100755 t/t1480-refs-lmdb-submodule.sh create mode 100644 test-refs-lmdb-backend.c -- 2.4.2.767.g62658d5-twtrsrc ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v7 01/33] setup: call setup_git_directory_gently before accessing refs 2016-03-01 0:52 [PATCH v7 00/33] refs backend David Turner @ 2016-03-01 0:52 ` David Turner 2016-03-01 8:35 ` Jeff King 0 siblings, 1 reply; 7+ messages in thread From: David Turner @ 2016-03-01 0:52 UTC (permalink / raw) To: git, peff, mhagger, pclouds; +Cc: David Turner Usually, git calls some form of setup_git_directory at startup. But sometimes, it doesn't. Usually, that's OK because it's not really using the repository. But in some cases, it is using the repo. In those cases, either setup_git_directory_gently must be called, or the repository (e.g. the refs) must not be accessed. In every case except grep and shortlog, we fix this problem by making the call. In grep, in the --no-index mode, we don't want to access repository, so we set a flag which prevents this. In shortlog, we only want to skip accessing the repository when running without a repo (in stdin mode), so we check that before calling the only repo-dependent function that doesn't do its own setup. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: David Turner <dturner@twopensource.com> --- builtin/grep.c | 1 + builtin/log.c | 2 +- builtin/shortlog.c | 7 ++++--- git.c | 2 +- shortlog.h | 2 +- test-match-trees.c | 2 ++ test-revision-walking.c | 2 ++ 7 files changed, 12 insertions(+), 6 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 9e3f1cf..1e36b52 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -531,6 +531,7 @@ static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec, if (exc_std) setup_standard_excludes(&dir); + dir.flags |= DIR_NO_GITLINKS; fill_directory(&dir, pathspec); for (i = 0; i < dir.nr; i++) { if (!dir_path_match(dir.entries[i], pathspec, 0, NULL)) diff --git a/builtin/log.c b/builtin/log.c index 0d738d6..1d0e43e 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -975,7 +975,7 @@ static void make_cover_letter(struct rev_info *rev, int use_stdout, strbuf_release(&sb); - shortlog_init(&log); + shortlog_init(&log, 0); log.wrap_lines = 1; log.wrap = 72; log.in1 = 2; diff --git a/builtin/shortlog.c b/builtin/shortlog.c index bfc082e..ab4305b 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -219,11 +219,12 @@ static int parse_wrap_args(const struct option *opt, const char *arg, int unset) return 0; } -void shortlog_init(struct shortlog *log) +void shortlog_init(struct shortlog *log, int nongit) { memset(log, 0, sizeof(*log)); - read_mailmap(&log->mailmap, &log->common_repo_prefix); + if (!nongit) + read_mailmap(&log->mailmap, &log->common_repo_prefix); log->list.strdup_strings = 1; log->wrap = DEFAULT_WRAPLEN; @@ -252,7 +253,7 @@ int cmd_shortlog(int argc, const char **argv, const char *prefix) struct parse_opt_ctx_t ctx; git_config(git_default_config, NULL); - shortlog_init(&log); + shortlog_init(&log, nongit); init_revisions(&rev, prefix); parse_options_start(&ctx, argc, argv, prefix, options, PARSE_OPT_KEEP_DASHDASH | PARSE_OPT_KEEP_ARGV0); diff --git a/git.c b/git.c index 6cc0c07..51e0508 100644 --- a/git.c +++ b/git.c @@ -376,7 +376,7 @@ static struct cmd_struct commands[] = { { "am", cmd_am, RUN_SETUP | NEED_WORK_TREE }, { "annotate", cmd_annotate, RUN_SETUP }, { "apply", cmd_apply, RUN_SETUP_GENTLY }, - { "archive", cmd_archive }, + { "archive", cmd_archive, RUN_SETUP_GENTLY }, { "bisect--helper", cmd_bisect__helper, RUN_SETUP }, { "blame", cmd_blame, RUN_SETUP }, { "branch", cmd_branch, RUN_SETUP }, diff --git a/shortlog.h b/shortlog.h index de4f86f..ed1fbca 100644 --- a/shortlog.h +++ b/shortlog.h @@ -19,7 +19,7 @@ struct shortlog { struct string_list mailmap; }; -void shortlog_init(struct shortlog *log); +void shortlog_init(struct shortlog *log, int nongit); void shortlog_add_commit(struct shortlog *log, struct commit *commit); diff --git a/test-match-trees.c b/test-match-trees.c index 109f03e..4dad709 100644 --- a/test-match-trees.c +++ b/test-match-trees.c @@ -6,6 +6,8 @@ int main(int ac, char **av) unsigned char hash1[20], hash2[20], shifted[20]; struct tree *one, *two; + setup_git_directory(); + if (get_sha1(av[1], hash1)) die("cannot parse %s as an object name", av[1]); if (get_sha1(av[2], hash2)) diff --git a/test-revision-walking.c b/test-revision-walking.c index 285f06b..3d03133 100644 --- a/test-revision-walking.c +++ b/test-revision-walking.c @@ -50,6 +50,8 @@ int main(int argc, char **argv) if (argc < 2) return 1; + setup_git_directory(); + if (!strcmp(argv[1], "run-twice")) { printf("1st\n"); if (!run_revision_walk()) -- 2.4.2.767.g62658d5-twtrsrc ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v7 01/33] setup: call setup_git_directory_gently before accessing refs 2016-03-01 0:52 ` [PATCH v7 01/33] setup: call setup_git_directory_gently before accessing refs David Turner @ 2016-03-01 8:35 ` Jeff King 2016-03-01 23:47 ` David Turner 0 siblings, 1 reply; 7+ messages in thread From: Jeff King @ 2016-03-01 8:35 UTC (permalink / raw) To: David Turner; +Cc: git, mhagger, pclouds On Mon, Feb 29, 2016 at 07:52:34PM -0500, David Turner wrote: > Usually, git calls some form of setup_git_directory at startup. But > sometimes, it doesn't. Usually, that's OK because it's not really > using the repository. But in some cases, it is using the repo. In > those cases, either setup_git_directory_gently must be called, or the > repository (e.g. the refs) must not be accessed. It's actually not just setup_git_directory(). We can also use check_repository_format(), which is used by enter_repo() (and hence by things like upload-pack). I think the rule really ought to be: if we didn't have check_repository_format_gently() tell us we have a valid repo, we should not access any repo elements (refs, objects, etc). I started earlier today on a patch series to identify and fix these cases independent of your series. The basic strategy was to adapt the existing "struct startup_info" to be available everywhere, and have relevant bits of code assert() on it, or even behave differently (e.g., if some library code should do different things in a repo versus not). But I think we can probably just scrap the assert() part of that. The assertions I put in were unsurprisingly at the entry points to the ref code. And your series supersedes that; we can't do anything with the refs until the ref backend is setup, and if we only do so in check_repository_format_gently(), then it amounts to the same thing. For the "behave differently" part, I needed it for the .mailmap case, but you fixed it below without having to add that. I think it's worth going through the changes here and comparing notes with what my series would have done. > diff --git a/builtin/grep.c b/builtin/grep.c > index 9e3f1cf..1e36b52 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -531,6 +531,7 @@ static int grep_directory(struct grep_opt *opt, const struct pathspec *pathspec, > if (exc_std) > setup_standard_excludes(&dir); > > + dir.flags |= DIR_NO_GITLINKS; > fill_directory(&dir, pathspec); > for (i = 0; i < dir.nr; i++) { > if (!dir_path_match(dir.entries[i], pathspec, 0, NULL)) This one is interesting, because the ref access in fill_directory() is only for hitting submodule refs. In theory, I guess a command operating in a non-repo could want to know about and do something with embedded git repos. And indeed, it does produce a behavior change here. With a repo like: mkdir non-repo && cd non-repo && git init sub && (cd sub && echo foo >file && git add . && git commit -m foo) running: git grep --no-index foo does not currently find sub/file (because it does not descend into what it think is a sub-repository), but it _does_ with your patch. I'm inclined to say that's actually a behavior improvement. "grep --no-index" on a directory is about behaving as a recursive grep, and should probably descend into sub-repos (it probably should also avoid looking inside .git directories, though, and I think it still does, even with your patch). The fill_directory() also touches the_index, which it should not in a non-repository. But I think that's probably OK, because we simply don't read the index in the first place (so it behaves naturally as if the index is empty). > diff --git a/builtin/log.c b/builtin/log.c > index 0d738d6..1d0e43e 100644 > --- a/builtin/log.c > +++ b/builtin/log.c > @@ -975,7 +975,7 @@ static void make_cover_letter(struct rev_info *rev, int use_stdout, > > strbuf_release(&sb); > > - shortlog_init(&log); > + shortlog_init(&log, 0); > log.wrap_lines = 1; > log.wrap = 72; > log.in1 = 2; This looks right. If we are making a cover letter for format-patch, we know we have a repo, and thus nongit is always 0. Though I admit the double-negating confused me for a minute. I don't know if there's a way around it, though, because "nongit" is what comes out of setup_git_directory(). > diff --git a/builtin/shortlog.c b/builtin/shortlog.c > index bfc082e..ab4305b 100644 > --- a/builtin/shortlog.c > +++ b/builtin/shortlog.c > @@ -219,11 +219,12 @@ static int parse_wrap_args(const struct option *opt, const char *arg, int unset) > return 0; > } > > -void shortlog_init(struct shortlog *log) > +void shortlog_init(struct shortlog *log, int nongit) > { > memset(log, 0, sizeof(*log)); > > - read_mailmap(&log->mailmap, &log->common_repo_prefix); > + if (!nongit) > + read_mailmap(&log->mailmap, &log->common_repo_prefix); My fix for this was to teach read_mailmap to avoid looking for HEAD:.mailmap if we are not in a repository, but to continue with the others (.mailmap in the cwd, and the mailmap.file config variable). Yours disables the .mailmap entirely. That makes some sense for looking at ".mailmap" in the working tree; if we do not have a repository, we should not look at a mailmap (though I guess you could argue the opposite, that a .mailmap in the current directory of a non-repo is worth looking at). But I'd think the mailmap.file config would apply even to shortlog invoked outside a repository. To be perfectly honest, I cannot imagine that shortlog is invoked with data on stdin much at all these days, let alone outside of a repository. But I do think your patch is a potential regression there, if anybody does do that. > diff --git a/git.c b/git.c > index 6cc0c07..51e0508 100644 > --- a/git.c > +++ b/git.c > @@ -376,7 +376,7 @@ static struct cmd_struct commands[] = { > { "am", cmd_am, RUN_SETUP | NEED_WORK_TREE }, > { "annotate", cmd_annotate, RUN_SETUP }, > { "apply", cmd_apply, RUN_SETUP_GENTLY }, > - { "archive", cmd_archive }, > + { "archive", cmd_archive, RUN_SETUP_GENTLY }, > { "bisect--helper", cmd_bisect__helper, RUN_SETUP }, > { "blame", cmd_blame, RUN_SETUP }, > { "branch", cmd_branch, RUN_SETUP }, I didn't have to touch this case in my experimenting. I wonder if it's because I resolved the "grep" case a little differently. I taught get_ref_cache() to only assert() that we have a repository when we are looking at the main ref-cache, not a submodule. In theory, we can look at a submodule from inside an outer non-repo (it's not really a submodule then, but just a plain git dir). I don't think there's anything in git right now that says you can't do so, though I think your refs-backend work does introduce that restriction (because it actually requires the submodules to use the same backend). So with that requirement, I think we do need to require a repo even to access submodule refs. Is that what triggered this change? I'd think you would need a matching line inside cmd_archive, too. It should allow "--remote" without a repo, but generating a local archive does need one. And indeed, I see in write_archive() that we run setup_git_repository ourselves, and die if we're not in a git repo. So I'm puzzled about which code path accesses the refs. > diff --git a/test-match-trees.c b/test-match-trees.c > index 109f03e..4dad709 100644 > --- a/test-match-trees.c > +++ b/test-match-trees.c > @@ -6,6 +6,8 @@ int main(int ac, char **av) > unsigned char hash1[20], hash2[20], shifted[20]; > struct tree *one, *two; > > + setup_git_directory(); > + > if (get_sha1(av[1], hash1)) > die("cannot parse %s as an object name", av[1]); > if (get_sha1(av[2], hash2)) This one is weird. The test-match-trees program is only used one time in our test suite, and then it is only as a hack because it is an external that does not have startup_info setup. I think that test is somewhat bogus (and is obsoleted by my approach), and we could probably get rid of this program entirely. But your patch here is certainly the right thing to do if we are keeping it. > diff --git a/test-revision-walking.c b/test-revision-walking.c > index 285f06b..3d03133 100644 > --- a/test-revision-walking.c > +++ b/test-revision-walking.c > @@ -50,6 +50,8 @@ int main(int argc, char **argv) > if (argc < 2) > return 1; > > + setup_git_directory(); > + > if (!strcmp(argv[1], "run-twice")) { > printf("1st\n"); > if (!run_revision_walk()) This one I solved in the same way. Yay, we agreed on one! :) -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v7 01/33] setup: call setup_git_directory_gently before accessing refs 2016-03-01 8:35 ` Jeff King @ 2016-03-01 23:47 ` David Turner 2016-03-02 0:33 ` David Turner 2016-03-02 2:45 ` Jeff King 0 siblings, 2 replies; 7+ messages in thread From: David Turner @ 2016-03-01 23:47 UTC (permalink / raw) To: Jeff King; +Cc: git, mhagger, pclouds On Tue, 2016-03-01 at 03:35 -0500, Jeff King wrote: > On Mon, Feb 29, 2016 at 07:52:34PM -0500, David Turner wrote: > > > Usually, git calls some form of setup_git_directory at startup. > > But > > sometimes, it doesn't. Usually, that's OK because it's not really > > using the repository. But in some cases, it is using the repo. In > > those cases, either setup_git_directory_gently must be called, or > > the > > repository (e.g. the refs) must not be accessed. > > It's actually not just setup_git_directory(). We can also use > check_repository_format(), which is used by enter_repo() (and hence > by > things like upload-pack). I think the rule really ought to be: if we > didn't have check_repository_format_gently() tell us we have a valid > repo, we should not access any repo elements (refs, objects, etc). I'll change that commit message to say "check_repository_format_gently". > > diff --git a/builtin/grep.c b/builtin/grep. > [snip: this is a probably-good behavior change] Agreed. > My fix for this was to teach read_mailmap to avoid looking for > HEAD:.mailmap if we are not in a repository, but to continue with the > others (.mailmap in the cwd, and the mailmap.file config variable). > ... > But I do think your patch is a potential regression there, if anybody > does do that. Your version sounds better. But I don't see it in the patch set you sent earlier? > > diff --git a/git.c b/git.c > > index 6cc0c07..51e0508 100644 > > --- a/git.c > > +++ b/git.c > > @@ -376,7 +376,7 @@ static struct cmd_struct commands[] = { > > { "am", cmd_am, RUN_SETUP | NEED_WORK_TREE }, > > { "annotate", cmd_annotate, RUN_SETUP }, > > { "apply", cmd_apply, RUN_SETUP_GENTLY }, > > - { "archive", cmd_archive }, > > + { "archive", cmd_archive, RUN_SETUP_GENTLY }, > > { "bisect--helper", cmd_bisect__helper, RUN_SETUP }, > > { "blame", cmd_blame, RUN_SETUP }, > > { "branch", cmd_branch, RUN_SETUP }, > > I didn't have to touch this case in my experimenting. I wonder if > it's > because I resolved the "grep" case a little differently. > > I taught get_ref_cache() to only assert() that we have a repository > when > we are looking at the main ref-cache, not a submodule. In theory, we > can > look at a submodule from inside an outer non-repo (it's not really a > submodule then, but just a plain git dir). I don't think there's > anything in git right now that says you can't do so, though I think > your > refs-backend work does introduce that restriction (because it > actually > requires the submodules to use the same backend). > > So with that requirement, I think we do need to require a repo even > to > access submodule refs. Is that what triggered this change? No. What triggered this change was a test failure with your earlier patch on master -- none of my stuff at all. The failing command was: git archive --remote=. HEAD When writing my patch, I had assumed that the issue was the resolve_ref on the HEAD that's an argument -- but it's not. The actual traceback is: #0 die ( err=err@entry=0x57ddb0 "BUG: resolve_ref called without initializing repo") at usage.c:99 #1 0x00000000004f7ed9 in resolve_ref_1 (sb_refname=0x7c4a50 <sb_refname>, sb_contents=0x7fffffffcfc0, sb_path=0x7fffffffcfe0, flags=0x7fffffffdaaa, sha1=0x7fffffffd100 "\b\326\377\377\377\177", resolve_flags=5572384, refname=0x2 <error: Cannot access memory at address 0x2>) at refs/files-backend.c:1429 #2 resolve_ref_unsafe (refname=refname@entry=0x550b3b "HEAD", resolve_flags=resolve_flags@entry=0, sha1=sha1@entry=0x7fffffffd100 "\b\326\377\377\377\177", flags=flags@entry=0x7fffffffd0fc) at refs/files-backend.c:1600 #3 0x00000000004ffe69 in read_config () at remote.c:471 #4 0x0000000000500235 in read_config () at remote.c:705 #5 remote_get_1 (name=0x7fffffffdaaa ".", get_default=get_default@entry=0x4fe230 <remote_for_branch>) at remote.c:688 #6 0x00000000005004ca in remote_get (name=<optimized out>) at remote.c:713 #7 0x00000000004159d8 in run_remote_archiver (name_hint=0x0, exec=0x550720 "git-upload-archive", remote=<optimized out>, argv=0x7fffffffd608, argc=2) at builtin/archive.c:35 #8 cmd_archive (argc=2, argv=0x7fffffffd608, prefix=0x0) at builtin/archive.c:104 #9 0x0000000000406051 in run_builtin (argv=0x7fffffffd608, argc=3, p=0x7bd7a0 <commands+96>) at git.c:357 #10 handle_builtin (argc=3, argv=0x7fffffffd608) at git.c:540 #11 0x000000000040519a in main (argc=3, av=<optimized out>) at git.c:671 > I'd think you would need a matching line inside cmd_archive, too. It > should allow "--remote" without a repo, but generating a local > archive > does need one. And indeed, I see in write_archive() that we run > setup_git_repository ourselves, and die if we're not in a git repo. > So > I'm puzzled about which code path accesses the refs. I agree that --remote should work without a repo, It seems that we do n't test this and we probably should. I'm not sure what the right way to fix this is -- in read_config, we're about to access some stuff in a repo (config, HEAD). It's OK to skip that stuff if we're not in a repo, but we don't want to run setup_git_directory twice (that breaks some stuff), and some of the other callers have already called it. On top of your earlier repo_initialized patch, we could add the following to read_config: + if (!repo_initialized) { + int nongit = 0; + setup_git_directory_gently(&nongit); + if (nongit) + return; + } But that patch I think was not intended to be permanent. Still, it does seem odd that there's no straightforward way to know if the repo is initialized. Am I missing something? > > diff --git a/test-match-trees.c b/test-match-trees.c > But your patch here is certainly the right thing to do if we are > keeping it. Let's keep it for now; we could always remove it later. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v7 01/33] setup: call setup_git_directory_gently before accessing refs 2016-03-01 23:47 ` David Turner @ 2016-03-02 0:33 ` David Turner 2016-03-02 2:45 ` Jeff King 1 sibling, 0 replies; 7+ messages in thread From: David Turner @ 2016-03-02 0:33 UTC (permalink / raw) To: Jeff King; +Cc: git, mhagger, pclouds On Tue, 2016-03-01 at 18:47 -0500, David Turner wrote: > On Tue, 2016-03-01 at 03:35 -0500, Jeff King wrote: > > On Mon, Feb 29, 2016 at 07:52:34PM -0500, David Turner wrote: > > > > > Usually, git calls some form of setup_git_directory at startup. > > > But > > > sometimes, it doesn't. Usually, that's OK because it's not > > > really > > > using the repository. But in some cases, it is using the repo. > > > In > > > those cases, either setup_git_directory_gently must be called, or > > > the > > > repository (e.g. the refs) must not be accessed. > > > > It's actually not just setup_git_directory(). We can also use > > check_repository_format(), which is used by enter_repo() (and hence > > by > > things like upload-pack). I think the rule really ought to be: if > > we > > didn't have check_repository_format_gently() tell us we have a > > valid > > repo, we should not access any repo elements (refs, objects, etc). > > I'll change that commit message to say > "check_repository_format_gently". > > > > diff --git a/builtin/grep.c b/builtin/grep. > > [snip: this is a probably-good behavior change] > > Agreed. > > > My fix for this was to teach read_mailmap to avoid looking for > > HEAD:.mailmap if we are not in a repository, but to continue with > > the > > others (.mailmap in the cwd, and the mailmap.file config variable). > > ... > > But I do think your patch is a potential regression there, if > > anybody > > does do that. > > Your version sounds better. But I don't see it in the patch set you > sent earlier? > > > > diff --git a/git.c b/git.c > > > index 6cc0c07..51e0508 100644 > > > --- a/git.c > > > +++ b/git.c > > > @@ -376,7 +376,7 @@ static struct cmd_struct commands[] = { > > > { "am", cmd_am, RUN_SETUP | NEED_WORK_TREE }, > > > { "annotate", cmd_annotate, RUN_SETUP }, > > > { "apply", cmd_apply, RUN_SETUP_GENTLY }, > > > - { "archive", cmd_archive }, > > > + { "archive", cmd_archive, RUN_SETUP_GENTLY }, > > > { "bisect--helper", cmd_bisect__helper, RUN_SETUP }, > > > { "blame", cmd_blame, RUN_SETUP }, > > > { "branch", cmd_branch, RUN_SETUP }, > > > > I didn't have to touch this case in my experimenting. I wonder if > > it's > > because I resolved the "grep" case a little differently. > > > > I taught get_ref_cache() to only assert() that we have a repository > > when > > we are looking at the main ref-cache, not a submodule. In theory, > > we > > can > > look at a submodule from inside an outer non-repo (it's not really > > a > > submodule then, but just a plain git dir). I don't think there's > > anything in git right now that says you can't do so, though I think > > your > > refs-backend work does introduce that restriction (because it > > actually > > requires the submodules to use the same backend). > > > > So with that requirement, I think we do need to require a repo even > > to > > access submodule refs. Is that what triggered this change? > > No. What triggered this change was a test failure with your earlier > patch on master -- none of my stuff at all. The failing command was: > > git archive --remote=. HEAD > > When writing my patch, I had assumed that the issue was the > resolve_ref > on the HEAD that's an argument -- but it's not. The actual traceback > is: > > #0 die ( > err=err@entry=0x57ddb0 "BUG: resolve_ref called without > initializing repo") at usage.c:99 > #1 0x00000000004f7ed9 in resolve_ref_1 (sb_refname=0x7c4a50 > <sb_refname>, > sb_contents=0x7fffffffcfc0, sb_path=0x7fffffffcfe0, > flags=0x7fffffffdaaa, > sha1=0x7fffffffd100 "\b\326\377\377\377\177", > resolve_flags=5572384, > refname=0x2 <error: Cannot access memory at address 0x2>) > at refs/files-backend.c:1429 > #2 resolve_ref_unsafe (refname=refname@entry=0x550b3b "HEAD", > resolve_flags=resolve_flags@entry=0, > sha1=sha1@entry=0x7fffffffd100 "\b\326\377\377\377\177", > flags=flags@entry=0x7fffffffd0fc) at refs/files-backend.c:1600 > #3 0x00000000004ffe69 in read_config () at remote.c:471 > #4 0x0000000000500235 in read_config () at remote.c:705 > #5 remote_get_1 (name=0x7fffffffdaaa ".", > get_default=get_default@entry=0x4fe230 <remote_for_branch>) > at remote.c:688 > #6 0x00000000005004ca in remote_get (name=<optimized out>) at > remote.c:713 > #7 0x00000000004159d8 in run_remote_archiver (name_hint=0x0, > exec=0x550720 "git-upload-archive", remote=<optimized out>, > argv=0x7fffffffd608, argc=2) at builtin/archive.c:35 > #8 cmd_archive (argc=2, argv=0x7fffffffd608, prefix=0x0) > at builtin/archive.c:104 > #9 0x0000000000406051 in run_builtin (argv=0x7fffffffd608, argc=3, > p=0x7bd7a0 <commands+96>) at git.c:357 > #10 handle_builtin (argc=3, argv=0x7fffffffd608) at git.c:540 > #11 0x000000000040519a in main (argc=3, av=<optimized out>) at > git.c:671 > > > I'd think you would need a matching line inside cmd_archive, too. > > It > > should allow "--remote" without a repo, but generating a local > > archive > > does need one. And indeed, I see in write_archive() that we run > > setup_git_repository ourselves, and die if we're not in a git repo. > > So > > I'm puzzled about which code path accesses the refs. > > I agree that --remote should work without a repo, It seems that we > do > n't test this and we probably should. > > I'm not sure what the right way to fix this is -- in read_config, > we're > about to access some stuff in a repo (config, HEAD). It's OK to skip > that stuff if we're not in a repo, but we don't want to run > setup_git_directory twice (that breaks some stuff), and some of the > other callers have already called it. On top of your earlier > repo_initialized patch, we could add the following to read_config: > > + if (!repo_initialized) { > + int nongit = 0; > + setup_git_directory_gently(&nongit); > + if (nongit) > + return; > + } > > But that patch I think was not intended to be permanent. Still, it > does seem odd that there's no straightforward way to know if the repo > is initialized. Am I missing something? I guess we could add a bit in startup_info. Was that what you were talking about there? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v7 01/33] setup: call setup_git_directory_gently before accessing refs 2016-03-01 23:47 ` David Turner 2016-03-02 0:33 ` David Turner @ 2016-03-02 2:45 ` Jeff King 1 sibling, 0 replies; 7+ messages in thread From: Jeff King @ 2016-03-02 2:45 UTC (permalink / raw) To: David Turner; +Cc: git, mhagger, pclouds On Tue, Mar 01, 2016 at 06:47:52PM -0500, David Turner wrote: > > My fix for this was to teach read_mailmap to avoid looking for > > HEAD:.mailmap if we are not in a repository, but to continue with the > > others (.mailmap in the cwd, and the mailmap.file config variable). > > ... > > But I do think your patch is a potential regression there, if anybody > > does do that. > > Your version sounds better. But I don't see it in the patch set you > sent earlier? It's not. Sorry to be unclear. There were _two_ cleanups I was talking about (cases where we don't check whether we're in a repo, and fact that the repo startup code is unreliable), and I got sucked into the second one. I'll try to work up and share my startup_info one today. > When writing my patch, I had assumed that the issue was the resolve_ref > on the HEAD that's an argument -- but it's not. The actual traceback > is: > [...] > #2 resolve_ref_unsafe (refname=refname@entry=0x550b3b "HEAD", > resolve_flags=resolve_flags@entry=0, > sha1=sha1@entry=0x7fffffffd100 "\b\326\377\377\377\177", > flags=flags@entry=0x7fffffffd0fc) at refs/files-backend.c:1600 > #3 0x00000000004ffe69 in read_config () at remote.c:471 Oh, right. I did see problems here but missed them when comparing my patch to yours. I ended up in remote.c:read_config, having it check whether startup_info->have_repository is set; if it isn't, there is no point in looking at HEAD. That covers this case, and several others I happened across. Thanks for clarifying. > I'm not sure what the right way to fix this is -- in read_config, we're > about to access some stuff in a repo (config, HEAD). It's OK to skip > that stuff if we're not in a repo, but we don't want to run > setup_git_directory twice (that breaks some stuff), and some of the > other callers have already called it. On top of your earlier > repo_initialized patch, we could add the following to read_config: > > + if (!repo_initialized) { > + int nongit = 0; > + setup_git_directory_gently(&nongit); > + if (nongit) > + return; > + } > > But that patch I think was not intended to be permanent. Still, it > does seem odd that there's no straightforward way to know if the repo > is initialized. Am I missing something? No, there isn't a straightforward way; I think we'll have to add one. I'll polish up my series which does this. -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-03-02 2:45 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-03-01 9:53 [PATCH v7 01/33] setup: call setup_git_directory_gently before accessing refs Duy Nguyen 2016-03-01 9:55 ` Jeff King -- strict thread matches above, loose matches on Subject: below -- 2016-03-01 0:52 [PATCH v7 00/33] refs backend David Turner 2016-03-01 0:52 ` [PATCH v7 01/33] setup: call setup_git_directory_gently before accessing refs David Turner 2016-03-01 8:35 ` Jeff King 2016-03-01 23:47 ` David Turner 2016-03-02 0:33 ` David Turner 2016-03-02 2:45 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).