From: shejialuo <shejialuo@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, "Ronan Pigott" <ronan@rjp.ie>,
"René Scharfe" <l.s.r@web.de>,
"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH 2/2] config: fix evaluating "onbranch" with nonexistent git dir
Date: Tue, 24 Sep 2024 22:11:42 +0800 [thread overview]
Message-ID: <ZvLInrIQRj8xpSgF@ArchLinux> (raw)
In-Reply-To: <535d0d07506e8248e47f90c1a7581679fc297b3d.1727171197.git.ps@pks.im>
On Tue, Sep 24, 2024 at 12:05:46PM +0200, Patrick Steinhardt wrote:
> The `include_by_branch()` function is responsible for evaluating whether
> or not a specific include should be pulled in based on the currently
> checked out branch. Naturally, his condition can only be evaluated when
> we have a properly initialized repository with a ref store in the first
> place. This is why the function guards against the case when either
> `data->repo` or `data->repo->gitdir` are `NULL` pointers.
>
> But the second check is insufficient: the `gitdir` may be set even
> though the repository has not been initialized. Quoting "setup.c":
>
> NEEDSWORK: currently we allow bogus GIT_DIR values to be set in some
> code paths so we also need to explicitly setup the environment if the
> user has set GIT_DIR. It may be beneficial to disallow bogus GIT_DIR
> values at some point in the future.
>
> So when either the GIT_DIR environment variable or the `--git-dir`
> global option are set by the user then `the_repository` may end up with
> an initialized `gitdir` variable. And this happens even when the dir is
> invalid, like for example when it doesn't exist. It follows that only
> checking for whether or not `gitdir` is `NULL` is not sufficient for us
> to determine whether the repository has been properly initialized.
>
When I dive into this bug report, I feel so wired about this behavior. I
don't mind whether the code sets "gitdir" field. This is not important.
In "setup.c::setup_git_directory_gently", it will set the
"the_repository->gitdir" by checking the environment variable
"GIT_DIR_ENVIRONMENT". And by using `--git-dir` option, the code will
set this environment variable, so these two ways will set the "gitdir"
field in the global variable "the_repository".
We actually check the validation of "--gir-dir" option before we set
this value. Let me give you an example here:
$ git --git-dir=notexist -c includeIf.onbranch:main.path=any fsck
fatal: not a git repository: 'notexist'
I am curious here. And I notice the following difference in
"setup.c::setup_explicit_git_dir" function:
if (!is_git_directory(gitdirenv)) {
if (nongit_ok) {
*nongit_ok = 1;
...
return NULL;
}
die(_("not a git repository: '%s'"), gitdirenv);
}
Apparently, the above example will execute the "die". For
"git-archive(1)", it will simply return NULL. This is because we allow
some commands to run outside of the git repo. And we distinguish them by
using the ".option" filed:
{ "apply", cmd_apply, RUN_SETUP_GENTLY },
{ "fsck", cmd_fsck, RUN_SETUP },
As we can see, the code will check whether the "gitdir" (although it is
not set into "the_repository" structure yet) is a valid git repository.
We already have this information.
In f7d61c4135 (config: don't depend on `the_repository` with branch
conditions, 2024-08-13). "config.c::include_by_branch" drops the global
variable "the_repository". This solves the problem reported by Ronan.
Because it happens in the set up process, the "data->repo" will be NULL.
config_with_options(cb, data, NULL, NULL, &opts);
int config_with_options(config_fn_t fn, void *data,
const struct git_config_source *config_source,
struct repository *repo,
const struct config_options *opts)
{
struct config_include_data inc = CONFIG_INCLUDE_INIT;
...
inc.repo = repo;
}
But the problem still exists for "git-archive(1)", in "cmd_archive"
function, we will initialize the configurations by using "repo_config".
But we are not inside the repo.
I wonder how we use the global variable "the_repository". I think the
main problem here is that we use "the_repository" structure outside of
the repo where we have already broken the semantics of the
"the_repository" variable.
> This issue can lead to us triggering a BUG: when using a config with an
> "includeIf.onbranch:" condition outside of a repository while using the
> `--git-dir` option pointing to an invalid Git directory we may end up
> trying to evaluate the condition even though the ref storage format has
> not been set up.
>
> This bisects to 173761e21b (setup: start tracking ref storage format,
> 2023-12-29), but that commit really only starts to surface the issue
> that has already existed beforehand. The code to check for `gitdir` was
> introduced via 85fe0e800c (config: work around bug with
> includeif:onbranch and early config, 2019-07-31), which tried to fix
> similar issues when we didn't yet have a repository set up. But the fix
> was incomplete as it missed the described scenario.
>
Yes, exactly. Because before 173761e21b, the code will always find the
files backend, it does care about whether we are in the git repo or not.
unsigned int format = REF_STORAGE_FORMAT_FILES;
const struct ref_storage_be *be = find_ref_storage_backend(format);
So, it won't complain.
> As the quoted comment mentions, we'd ideally refactor the code to not
> set up `gitdir` with an invalid value in the first place, but that may
> be a bigger undertaking. Instead, refactor the code to use the ref
> storage format as an indicator of whether or not the ref store has been
> set up to fix the bug.
>
Should we? From my above comments, it does no matter whether we set the
"gitdir" in the "the_repository". Because we already check this and we
have this information. I think the main problem is that we use
"the_repository" badly.
We could run git commands inside the repo or outside the repo. If we run
git commands outside the repo, should we use the "the_repository"
variable? I guess we should not.
Because "git_config", "repo_config" and so on use the global variable
"the_repository", so we will encounter the trouble. But if we could use
something like "data->repo". We will make everything OK:
1. When running commands inside the git repo, we set "data->repo" to be
"the_repository".
2. When ruing commands outside the git repo, we set "data->repo" to be
NULL.
> Reported-by: Ronan Pigott <ronan@rjp.ie>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> config.c | 15 +++++++++------
> t/t1305-config-include.sh | 2 +-
> 2 files changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/config.c b/config.c
> index 1266eab0860..a11bb85da30 100644
> --- a/config.c
> +++ b/config.c
> @@ -306,13 +306,16 @@ static int include_by_branch(struct config_include_data *data,
> int flags;
> int ret;
> struct strbuf pattern = STRBUF_INIT;
> - const char *refname = (!data->repo || !data->repo->gitdir) ?
> - NULL : refs_resolve_ref_unsafe(get_main_ref_store(data->repo),
> - "HEAD", 0, NULL, &flags);
> - const char *shortname;
> + const char *refname, *shortname;
>
> - if (!refname || !(flags & REF_ISSYMREF) ||
> - !skip_prefix(refname, "refs/heads/", &shortname))
> + if (!data->repo || data->repo->ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN)
> + return 0;
> +
> + refname = refs_resolve_ref_unsafe(get_main_ref_store(data->repo),
> + "HEAD", 0, NULL, &flags);
> + if (!refname ||
> + !(flags & REF_ISSYMREF) ||
> + !skip_prefix(refname, "refs/heads/", &shortname))
> return 0;
>
> strbuf_add(&pattern, cond, cond_len);
> diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh
> index ad08db72308..517d6c86937 100755
> --- a/t/t1305-config-include.sh
> +++ b/t/t1305-config-include.sh
> @@ -389,7 +389,7 @@ test_expect_success 'onbranch without repository' '
> test_must_fail nongit git config get foo.bar
> '
>
> -test_expect_failure 'onbranch without repository but explicit nonexistent Git directory' '
> +test_expect_success 'onbranch without repository but explicit nonexistent Git directory' '
> test_when_finished "rm -f .gitconfig config.inc" &&
> git config set -f .gitconfig "includeIf.onbranch:**.path" config.inc &&
> git config set -f config.inc foo.bar baz &&
> --
> 2.46.0.551.gc5ee8f2d1c.dirty
next prev parent reply other threads:[~2024-09-24 14:10 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-20 17:37 BUG: refs.c:1933: reference backend is unknown Ronan Pigott
2024-09-21 16:09 ` shejialuo
2024-09-21 18:06 ` Ronan Pigott
2024-09-21 21:22 ` René Scharfe
2024-09-22 6:51 ` shejialuo
2024-09-22 14:41 ` shejialuo
2024-09-24 10:05 ` [PATCH 0/2] config: fix evaluating "onbranch" with nonexistent git dir Patrick Steinhardt
2024-09-24 10:05 ` [PATCH 1/2] t1305: exercise edge cases of "onbranch" includes Patrick Steinhardt
2024-09-24 16:05 ` Junio C Hamano
2024-09-24 10:05 ` [PATCH 2/2] config: fix evaluating "onbranch" with nonexistent git dir Patrick Steinhardt
2024-09-24 14:11 ` shejialuo [this message]
2024-09-24 16:17 ` Junio C Hamano
2024-09-24 14:20 ` [PATCH 0/2] " shejialuo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZvLInrIQRj8xpSgF@ArchLinux \
--to=shejialuo@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=l.s.r@web.de \
--cc=ps@pks.im \
--cc=ronan@rjp.ie \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).