All of lore.kernel.org
 help / color / mirror / Atom feed
From: Victoria Dye <vdye@github.com>
To: Josh Steadmon <steadmon@google.com>, git@vger.kernel.org
Cc: lessleydennington@gmail.com, gitster@pobox.com
Subject: Re: [RFC PATCH] repo-settings: set defaults even when not in a repo
Date: Wed, 23 Mar 2022 13:11:10 -0700	[thread overview]
Message-ID: <fcfdcbb9-761a-0d34-7d36-61e0ef279922@github.com> (raw)
In-Reply-To: <1b27e0b115f858a422e0a2891688227be8f3db01.1648055915.git.steadmon@google.com>

Josh Steadmon wrote:
> prepare_repo_settings() initializes a `struct repository` with various
> default config options and settings read from a repository-local config
> file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> in git repos), prepare_repo_settings was changed to issue a BUG() if it
> is called by a process whose CWD is not a Git repository. This approach
> was suggested in [1].
> 
> This breaks fuzz-commit-graph, which attempts to parse arbitrary
> fuzzing-engine-provided bytes as a commit graph file.
> commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> since we run the fuzz tests without a valid repository, we are hitting
> the BUG() from 44c7e62 for every test case.
> 
> Fix this by refactoring prepare_repo_settings() such that it sets
> default options unconditionally; if its process is in a Git repository,
> it will also load settings from the local config. This eliminates the
> need for a BUG() when not in a repository.
> 

I think the decision of whether to go with this approach or the alternative
listed below depends on the validity of a 'repository' without a gitdir. 

As far as I can tell, there is an implicit conflict between the changes in:

1. b66d84756f (commit-graph: respect 'commitGraph.readChangedPaths',
   2020-09-09)
2. 44c7e62e51 (repo-settings: prepare_repo_settings only in git repos,
   2021-12-06) (as you pointed out in your message)

The former says that commit-graph should use a repository setting (implying
it needs a valid repository), and the latter says that you need a valid
gitdir to get repository settings.

So to me, how to proceed depends on whether a repository can be "valid"
without a gitdir or not:

* If it is valid (or not-completely-broken/semi-valid - i.e., not a BUG()),
  then your approach is probably fine - the 'BUG()' should be removed from
  'prepare_repo_settings()'. But, to make sure we don't run into this
  question again, 'prepare_repo_settings()' should be audited for any
  settings/setup that implicitly require a gitdir and updated accordingly,
  along with a comment somewhere saying that the 'gitdir' isn't required.
* If it is not valid, then this use in 'fuzz-commit-graph' tests is fragile
  (even if it's not a problem right now) - 'prepare_repo_settings()' could
  start needing the gitdir, or "the_hash_algo" could become invalid, or some
  other problem could arise from the invalid repo and the tests would break.

I don't have a good answer to this question, but someone with more
experience in this area might be able to help.

> Concerns:
> 
> Are any callers strictly dependent on having a BUG() here? I suspect
> that the worst that would happen is that rather than this BUG(), the
> caller would later hit its own BUG() or die(), so I do not think this is
> a blocker. Additionally, every builtin that directly calls
> prepare_repo_settings is either marked as RUN_SETUP, which means we
> would die() prior to calling it anyway, or checks on its own before
> calling it (builtin/diff.c). There are several callers in library code,
> though, and I have not tracked down how all of those are used.
> 
> Alternatives considered:
> 
> Setting up a valid repository for fuzz testing would avoid triggering
> this bug, but would unacceptably slow down the test cases.
> 
> Refactoring parse_commit_graph() in such a way that the fuzz test has an
> alternate entry point that avoids calling prepare_repo_settings() might
> be possible, but would be a much larger change than this one. It would
> also run the risk that the changes would be so extensive that the fuzzer
> would be merely testing its own custom commit-graph implementation,
> rather than the one that's actually used in the real world.
> 

If the 'repository' really is *completely* invalid without a gitdir, I don't
think the refactor of 'commit-graph.c' would necessarily be too bad -
certainly not so extensive that it's not testing "real world" behavior:

* Make 'parse_commit_graph()' only call 'prepare_repo_settings()' if the
  repo has a valid gitdir (guarded like 'git diff' is)
* Guard all use use of 'r' in 'parse_commit_graph()' based on the existence
  of 'gitdir', with fallbacks on e.g. global config settings. From what I
  can see, the only usage is:
    * The check for 'r->settings.commit_graph_read_changed_paths'
    * The call to 'get_configured_generation_version()'
    * The use of 'the_hash_algo->raw_sz'

> [1]: https://lore.kernel.org/git/xmqqcznh8913.fsf@gitster.g/
> 
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
>  repo-settings.c | 111 ++++++++++++++++++++++++------------------------
>  1 file changed, 55 insertions(+), 56 deletions(-)
> 
> diff --git a/repo-settings.c b/repo-settings.c
> index b4fbd16cdc..d154b37007 100644
> --- a/repo-settings.c
> +++ b/repo-settings.c
> @@ -17,9 +17,6 @@ void prepare_repo_settings(struct repository *r)
>  	char *strval;
>  	int manyfiles;
>  
> -	if (!r->gitdir)
> -		BUG("Cannot add settings for uninitialized repository");
> -
>  	if (r->settings.initialized++)
>  		return;
>  
> @@ -28,27 +25,63 @@ void prepare_repo_settings(struct repository *r)
>  	r->settings.core_untracked_cache = UNTRACKED_CACHE_KEEP;
>  	r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
>  
> -	/* Booleans config or default, cascades to other settings */
> -	repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
> -	repo_cfg_bool(r, "feature.experimental", &experimental, 0);
> +	if (r->gitdir) {
> +		/* Booleans config or default, cascades to other settings */
> +		repo_cfg_bool(r, "feature.manyfiles", &manyfiles, 0);
> +		repo_cfg_bool(r, "feature.experimental", &experimental, 0);
>  
> -	/* Defaults modified by feature.* */
> -	if (experimental) {
> -		r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> -	}
> -	if (manyfiles) {
> -		r->settings.index_version = 4;
> -		r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
> -	}
> +		/* Defaults modified by feature.* */
> +		if (experimental) {
> +			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> +		}
> +		if (manyfiles) {
> +			r->settings.index_version = 4;
> +			r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE;
> +		}
> +
> +		/* Boolean config or default, does not cascade (simple)  */
> +		repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> +		repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
> +		repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
> +		repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
> +		repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
> +		repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
> +		repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
>  
> -	/* Boolean config or default, does not cascade (simple)  */
> -	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
> -	repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
> -	repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
> -	repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
> -	repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1);
> -	repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1);
> -	repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0);
> +		/*
> +		 * Non-boolean config
> +		 */
> +		if (!repo_config_get_int(r, "index.version", &value))
> +			r->settings.index_version = value;
> +
> +		if (!repo_config_get_string(r, "core.untrackedcache", &strval)) {
> +			int v = git_parse_maybe_bool(strval);
> +
> +			/*
> +			 * If it's set to "keep", or some other non-boolean
> +			 * value then "v < 0". Then we do nothing and keep it
> +			 * at the default of UNTRACKED_CACHE_KEEP.
> +			 */
> +			if (v >= 0)
> +				r->settings.core_untracked_cache = v ?
> +					UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE;
> +			free(strval);
> +		}
> +
> +		if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) {
> +			int fetch_default = r->settings.fetch_negotiation_algorithm;
> +			if (!strcasecmp(strval, "skipping"))
> +				r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> +			else if (!strcasecmp(strval, "noop"))
> +				r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP;
> +			else if (!strcasecmp(strval, "consecutive"))
> +				r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
> +			else if (!strcasecmp(strval, "default"))
> +				r->settings.fetch_negotiation_algorithm = fetch_default;
> +			else
> +				die("unknown fetch negotiation algorithm '%s'", strval);
> +		}
> +	}
>  
>  	/*
>  	 * The GIT_TEST_MULTI_PACK_INDEX variable is special in that
> @@ -60,40 +93,6 @@ void prepare_repo_settings(struct repository *r)
>  	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
>  		r->settings.core_multi_pack_index = 1;
>  
> -	/*
> -	 * Non-boolean config
> -	 */
> -	if (!repo_config_get_int(r, "index.version", &value))
> -		r->settings.index_version = value;
> -
> -	if (!repo_config_get_string(r, "core.untrackedcache", &strval)) {
> -		int v = git_parse_maybe_bool(strval);
> -
> -		/*
> -		 * If it's set to "keep", or some other non-boolean
> -		 * value then "v < 0". Then we do nothing and keep it
> -		 * at the default of UNTRACKED_CACHE_KEEP.
> -		 */
> -		if (v >= 0)
> -			r->settings.core_untracked_cache = v ?
> -				UNTRACKED_CACHE_WRITE : UNTRACKED_CACHE_REMOVE;
> -		free(strval);
> -	}
> -
> -	if (!repo_config_get_string(r, "fetch.negotiationalgorithm", &strval)) {
> -		int fetch_default = r->settings.fetch_negotiation_algorithm;
> -		if (!strcasecmp(strval, "skipping"))
> -			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
> -		else if (!strcasecmp(strval, "noop"))
> -			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_NOOP;
> -		else if (!strcasecmp(strval, "consecutive"))
> -			r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_CONSECUTIVE;
> -		else if (!strcasecmp(strval, "default"))
> -			r->settings.fetch_negotiation_algorithm = fetch_default;
> -		else
> -			die("unknown fetch negotiation algorithm '%s'", strval);
> -	}
> -
>  	/*
>  	 * This setting guards all index reads to require a full index
>  	 * over a sparse index. After suitable guards are placed in the
> 
> base-commit: 715d08a9e51251ad8290b181b6ac3b9e1f9719d7


  parent reply	other threads:[~2022-03-23 20:11 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-23 18:03 [RFC PATCH] repo-settings: set defaults even when not in a repo Josh Steadmon
2022-03-23 19:22 ` Derrick Stolee
2022-03-23 19:52   ` Taylor Blau
2022-03-28 19:15     ` Josh Steadmon
2022-03-29  1:21       ` Taylor Blau
2022-03-28 19:53     ` Josh Steadmon
2022-03-29  1:22       ` Taylor Blau
2022-03-29  9:03     ` Ævar Arnfjörð Bjarmason
2022-03-30  2:26       ` Taylor Blau
2022-04-09  6:33         ` Josh Steadmon
2022-03-29  9:04     ` Ævar Arnfjörð Bjarmason
2022-03-30  2:34       ` Taylor Blau
2022-03-30 17:38         ` Ævar Arnfjörð Bjarmason
2022-03-30 20:14           ` Junio C Hamano
2022-04-09  6:52     ` [RFC PATCH v2] commit-graph: refactor to avoid prepare_repo_settings Josh Steadmon
2022-06-07 20:02       ` Jonathan Tan
2022-06-14 22:38         ` Josh Steadmon
2022-06-14 22:37     ` [PATCH v3] " Josh Steadmon
2022-06-14 23:32       ` Taylor Blau
2022-06-23 21:59       ` Junio C Hamano
2022-07-14 21:44         ` Josh Steadmon
2022-07-14 21:43     ` [PATCH v4] commit-graph: pass repo_settings instead of repository Josh Steadmon
2022-07-14 22:48       ` Junio C Hamano
2022-03-23 20:11 ` Victoria Dye [this message]
2022-03-23 20:54   ` [RFC PATCH] repo-settings: set defaults even when not in a repo Junio C Hamano
2022-03-23 21:19     ` Victoria Dye
2022-03-23 20:51 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fcfdcbb9-761a-0d34-7d36-61e0ef279922@github.com \
    --to=vdye@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=lessleydennington@gmail.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.