From: shejialuo <shejialuo@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, "Ronan Pigott" <ronan@rjp.ie>,
"René Scharfe" <l.s.r@web.de>,
"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH 2/2] config: fix evaluating "onbranch" with nonexistent git dir
Date: Tue, 24 Sep 2024 22:11:42 +0800 [thread overview]
Message-ID: <ZvLInrIQRj8xpSgF@ArchLinux> (raw)
In-Reply-To: <535d0d07506e8248e47f90c1a7581679fc297b3d.1727171197.git.ps@pks.im>
On Tue, Sep 24, 2024 at 12:05:46PM +0200, Patrick Steinhardt wrote:
> The `include_by_branch()` function is responsible for evaluating whether
> or not a specific include should be pulled in based on the currently
> checked out branch. Naturally, his condition can only be evaluated when
> we have a properly initialized repository with a ref store in the first
> place. This is why the function guards against the case when either
> `data->repo` or `data->repo->gitdir` are `NULL` pointers.
>
> But the second check is insufficient: the `gitdir` may be set even
> though the repository has not been initialized. Quoting "setup.c":
>
> NEEDSWORK: currently we allow bogus GIT_DIR values to be set in some
> code paths so we also need to explicitly setup the environment if the
> user has set GIT_DIR. It may be beneficial to disallow bogus GIT_DIR
> values at some point in the future.
>
> So when either the GIT_DIR environment variable or the `--git-dir`
> global option are set by the user then `the_repository` may end up with
> an initialized `gitdir` variable. And this happens even when the dir is
> invalid, like for example when it doesn't exist. It follows that only
> checking for whether or not `gitdir` is `NULL` is not sufficient for us
> to determine whether the repository has been properly initialized.
>
When I dive into this bug report, I feel so wired about this behavior. I
don't mind whether the code sets "gitdir" field. This is not important.
In "setup.c::setup_git_directory_gently", it will set the
"the_repository->gitdir" by checking the environment variable
"GIT_DIR_ENVIRONMENT". And by using `--git-dir` option, the code will
set this environment variable, so these two ways will set the "gitdir"
field in the global variable "the_repository".
We actually check the validation of "--gir-dir" option before we set
this value. Let me give you an example here:
$ git --git-dir=notexist -c includeIf.onbranch:main.path=any fsck
fatal: not a git repository: 'notexist'
I am curious here. And I notice the following difference in
"setup.c::setup_explicit_git_dir" function:
if (!is_git_directory(gitdirenv)) {
if (nongit_ok) {
*nongit_ok = 1;
...
return NULL;
}
die(_("not a git repository: '%s'"), gitdirenv);
}
Apparently, the above example will execute the "die". For
"git-archive(1)", it will simply return NULL. This is because we allow
some commands to run outside of the git repo. And we distinguish them by
using the ".option" filed:
{ "apply", cmd_apply, RUN_SETUP_GENTLY },
{ "fsck", cmd_fsck, RUN_SETUP },
As we can see, the code will check whether the "gitdir" (although it is
not set into "the_repository" structure yet) is a valid git repository.
We already have this information.
In f7d61c4135 (config: don't depend on `the_repository` with branch
conditions, 2024-08-13). "config.c::include_by_branch" drops the global
variable "the_repository". This solves the problem reported by Ronan.
Because it happens in the set up process, the "data->repo" will be NULL.
config_with_options(cb, data, NULL, NULL, &opts);
int config_with_options(config_fn_t fn, void *data,
const struct git_config_source *config_source,
struct repository *repo,
const struct config_options *opts)
{
struct config_include_data inc = CONFIG_INCLUDE_INIT;
...
inc.repo = repo;
}
But the problem still exists for "git-archive(1)", in "cmd_archive"
function, we will initialize the configurations by using "repo_config".
But we are not inside the repo.
I wonder how we use the global variable "the_repository". I think the
main problem here is that we use "the_repository" structure outside of
the repo where we have already broken the semantics of the
"the_repository" variable.
> This issue can lead to us triggering a BUG: when using a config with an
> "includeIf.onbranch:" condition outside of a repository while using the
> `--git-dir` option pointing to an invalid Git directory we may end up
> trying to evaluate the condition even though the ref storage format has
> not been set up.
>
> This bisects to 173761e21b (setup: start tracking ref storage format,
> 2023-12-29), but that commit really only starts to surface the issue
> that has already existed beforehand. The code to check for `gitdir` was
> introduced via 85fe0e800c (config: work around bug with
> includeif:onbranch and early config, 2019-07-31), which tried to fix
> similar issues when we didn't yet have a repository set up. But the fix
> was incomplete as it missed the described scenario.
>
Yes, exactly. Because before 173761e21b, the code will always find the
files backend, it does care about whether we are in the git repo or not.
unsigned int format = REF_STORAGE_FORMAT_FILES;
const struct ref_storage_be *be = find_ref_storage_backend(format);
So, it won't complain.
> As the quoted comment mentions, we'd ideally refactor the code to not
> set up `gitdir` with an invalid value in the first place, but that may
> be a bigger undertaking. Instead, refactor the code to use the ref
> storage format as an indicator of whether or not the ref store has been
> set up to fix the bug.
>
Should we? From my above comments, it does no matter whether we set the
"gitdir" in the "the_repository". Because we already check this and we
have this information. I think the main problem is that we use
"the_repository" badly.
We could run git commands inside the repo or outside the repo. If we run
git commands outside the repo, should we use the "the_repository"
variable? I guess we should not.
Because "git_config", "repo_config" and so on use the global variable
"the_repository", so we will encounter the trouble. But if we could use
something like "data->repo". We will make everything OK:
1. When running commands inside the git repo, we set "data->repo" to be
"the_repository".
2. When ruing commands outside the git repo, we set "data->repo" to be
NULL.
> Reported-by: Ronan Pigott <ronan@rjp.ie>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> config.c | 15 +++++++++------
> t/t1305-config-include.sh | 2 +-
> 2 files changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/config.c b/config.c
> index 1266eab0860..a11bb85da30 100644
> --- a/config.c
> +++ b/config.c
> @@ -306,13 +306,16 @@ static int include_by_branch(struct config_include_data *data,
> int flags;
> int ret;
> struct strbuf pattern = STRBUF_INIT;
> - const char *refname = (!data->repo || !data->repo->gitdir) ?
> - NULL : refs_resolve_ref_unsafe(get_main_ref_store(data->repo),
> - "HEAD", 0, NULL, &flags);
> - const char *shortname;
> + const char *refname, *shortname;
>
> - if (!refname || !(flags & REF_ISSYMREF) ||
> - !skip_prefix(refname, "refs/heads/", &shortname))
> + if (!data->repo || data->repo->ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN)
> + return 0;
> +
> + refname = refs_resolve_ref_unsafe(get_main_ref_store(data->repo),
> + "HEAD", 0, NULL, &flags);
> + if (!refname ||
> + !(flags & REF_ISSYMREF) ||
> + !skip_prefix(refname, "refs/heads/", &shortname))
> return 0;
>
> strbuf_add(&pattern, cond, cond_len);
> diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh
> index ad08db72308..517d6c86937 100755
> --- a/t/t1305-config-include.sh
> +++ b/t/t1305-config-include.sh
> @@ -389,7 +389,7 @@ test_expect_success 'onbranch without repository' '
> test_must_fail nongit git config get foo.bar
> '
>
> -test_expect_failure 'onbranch without repository but explicit nonexistent Git directory' '
> +test_expect_success 'onbranch without repository but explicit nonexistent Git directory' '
> test_when_finished "rm -f .gitconfig config.inc" &&
> git config set -f .gitconfig "includeIf.onbranch:**.path" config.inc &&
> git config set -f config.inc foo.bar baz &&
> --
> 2.46.0.551.gc5ee8f2d1c.dirty
next prev parent reply other threads:[~2024-09-24 14:10 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-20 17:37 BUG: refs.c:1933: reference backend is unknown Ronan Pigott
2024-09-21 16:09 ` shejialuo
2024-09-21 18:06 ` Ronan Pigott
2024-09-21 21:22 ` René Scharfe
2024-09-22 6:51 ` shejialuo
2024-09-22 14:41 ` shejialuo
2024-09-24 10:05 ` [PATCH 0/2] config: fix evaluating "onbranch" with nonexistent git dir Patrick Steinhardt
2024-09-24 10:05 ` [PATCH 1/2] t1305: exercise edge cases of "onbranch" includes Patrick Steinhardt
2024-09-24 16:05 ` Junio C Hamano
2024-09-24 10:05 ` [PATCH 2/2] config: fix evaluating "onbranch" with nonexistent git dir Patrick Steinhardt
2024-09-24 14:11 ` shejialuo [this message]
2024-09-24 16:17 ` Junio C Hamano
2024-09-24 14:20 ` [PATCH 0/2] " shejialuo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZvLInrIQRj8xpSgF@ArchLinux \
--to=shejialuo@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=l.s.r@web.de \
--cc=ps@pks.im \
--cc=ronan@rjp.ie \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.