All of lore.kernel.org
 help / color / mirror / Atom feed
From: Victoria Dye <vdye@github.com>
To: Glen Choo <chooglen@google.com>, git@vger.kernel.org
Cc: Victoria Dye via GitGitGadget <gitgitgadget@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	derrickstolee@github.com, gitster@pobox.com
Subject: Re: [PATCH] setup: copy repository_format using helper
Date: Mon, 12 Jun 2023 17:03:17 -0700	[thread overview]
Message-ID: <9fb6d7b1-00b6-93ee-efec-9dd0ab91a66d@github.com> (raw)
In-Reply-To: <20230612230453.70864-1-chooglen@google.com>

Glen Choo wrote:
> In several parts of the setup machinery, we set up a repository_format
> and then use it to set up the_repository in nearly the exact same way,
> suggesting that we might be able to use a helper function to standardize
> the behavior and make future modifications easier. Create this helper
> function, setup_repository_from_format(), thus standardizing this
> behavior.
> 
> To determine what the 'standardized behavior' should be, we can compare
> the candidate call sites in repo_init(), setup_git_directory_gently(),
> check_repository_format() and discover_git_directory(),
> 
> - All of them copy .worktree_config.
> 
> - All of them 'copy' .partial_clone. Most perform a shallow copy of the
>   pointer, then set the .partial_clone = NULL so that it doesn't get
>   cleared by clear_repository_format(). However,
>   check_repository_format() copies the string deeply because the
>   repository_format is sometimes read back (it is an "out" parameter).
>   To accomodate both shallow copying and deep copying, toggle this
>   behavior using the "modify_fmt_ok" parameter.

Do you have a specific example of this happening? I see two uses of
'check_repository_format()' in the codebase:

1. in 'enter_repo()' ('path.c')
2. in 'init_db()' ('init-db.c')

The first one calls 'check_repository_format()' with 'NULL', which causes
the function to create a temporary 'struct repository_format' that is then
discarded at the end of the function - no need to worry about the value
being cleared there.

The second one does call 'check_repository_format()' with a 'struct
repository_format' instance, but the 'partial_clone' field field is not
accessed again after that. The only subsequent usages of the 'repo_fmt'
variable in 'init_db()' are:

- in 'validate_hash_algorithm()', where only the 'version' and 'hash_algo'
  fields are accessed.
- in 'create_default_files()', where only 'hash_algo' is accessed.

So, shouldn't it be safe to shallow-copy-and-NULL? But as I noted earlier
[1], if you do that it'll make the name 'check_repository_format()' a bit
misleading (since it's actually modifying its arg in place). So, if you
update to always shallow copy, 'check_repository_format()' should be renamed
to reflect its side effects.

[1] https://lore.kernel.org/git/49509708-c0a1-2439-a551-cab05d944b66@github.com/

> 
> - Most of them set up repository.hash_algo, except
>   discover_git_directory(). Our helper function unconditionally sets up
>   .hash_algo because it turns out that discover_git_directory() probably
>   doesn't need to set up "struct repository" at all!

If that's the case, shouldn't the 'repository_format' assignments in
'discover_git_directory()' be removed altogether? 

>   discover_git_directory() isn't actually used in the setup process - its
>   only caller in the Git binary is read_early_config(). As explained by
>   16ac8b8db6 (setup: introduce the discover_git_directory() function,
>   2017-03-13), it is supposed to be an entrypoint into setup.c machinery
>   that allows the Git directory to be discovered without side effects,
>   in other words, we shouldn't have introduced side effects in
>   ebaf3bcf1ae (repository: move global r_f_p_c to repo struct,
>   2021-06-17). Fortunately, we didn't start to rely on this unintended
>   behavior between then and now, so we can just drop it.
> 
> Signed-off-by: Glen Choo <chooglen@google.com>
> ---
> Here's the helper function I had in mind. I was initially mistaken and
> it turns out that we need to support deep copying, but fortunately,
> t0001 is extremely thorough and will catch virtually any mistake in the
> setup process. CI seems to pass, though it appears to be a little flaky
> today and sometimes cancels jobs
> (https://github.com/chooglen/git/actions/runs/5249029150).
> 
> If you're comfortable with it, I would prefer for you to squash this
> into your patches so that we don't just end up changing the same few
> lines. If not, I'll Reviewed-by your patches (if I don't find any other
> concerns on a re-read) and send this as a 1-patch on top.

Reading through the commit message & patch, I'm still not convinced this
refactor is a good idea - to me, it doesn't leave the code in a clearly
better state. If you feel strongly that it does, though, I'm happy to leave
it to others to review/decide but I would prefer that you keep it a separate
patch submission on top.

Thanks!

> diff --git a/repository.c b/repository.c
> index 104960f8f5..50f0b26a6c 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -181,12 +181,7 @@ int repo_init(struct repository *repo,
>  	if (read_and_verify_repository_format(&format, repo->commondir))
>  		goto error;
>  
> -	repo_set_hash_algo(repo, format.hash_algo);
> -	repo->repository_format_worktree_config = format.worktree_config;
> -
> -	/* take ownership of format.partial_clone */
> -	repo->repository_format_partial_clone = format.partial_clone;
> -	format.partial_clone = NULL;
> +	setup_repository_from_format(repo, &format, 1);
>  
>  	if (worktree)
>  		repo_set_worktree(repo, worktree);
> diff --git a/setup.c b/setup.c
> index d866395435..33ce58676f 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1561,13 +1561,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
>  			setup_git_env(gitdir);
>  		}
>  		if (startup_info->have_repository) {
> -			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
> -			the_repository->repository_format_worktree_config =
> -				repo_fmt.worktree_config;
> -			/* take ownership of repo_fmt.partial_clone */
> -			the_repository->repository_format_partial_clone =
> -				repo_fmt.partial_clone;
> -			repo_fmt.partial_clone = NULL;
> +			setup_repository_from_format(the_repository,
> +						     &repo_fmt, 1);
>  		}
>  	}
>  	/*
> @@ -1654,14 +1649,26 @@ void check_repository_format(struct repository_format *fmt)
>  		fmt = &repo_fmt;
>  	check_repository_format_gently(get_git_dir(), fmt, NULL);
>  	startup_info->have_repository = 1;
> -	repo_set_hash_algo(the_repository, fmt->hash_algo);
> -	the_repository->repository_format_worktree_config =
> -		fmt->worktree_config;
> -	the_repository->repository_format_partial_clone =
> -		xstrdup_or_null(fmt->partial_clone);
> +	setup_repository_from_format(the_repository, fmt, 0);
>  	clear_repository_format(&repo_fmt);
>  }
>  

I think you may be missing changes to 'discover_git_directory()'? Like I
mentioned above, though, if you don't think 'discover_git_directory()' needs
to set up 'the_repository', then those assignments should just be removed
(not replaced with 'setup_repository_from_format()').


  reply	other threads:[~2023-06-13  0:03 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-23 23:17 [PATCH 0/2] Fix behavior of worktree config in submodules Victoria Dye via GitGitGadget
2023-05-23 23:17 ` [PATCH 1/2] config: use gitdir to get worktree config Victoria Dye via GitGitGadget
2023-05-25  1:05   ` Glen Choo
2023-05-25 20:05     ` Derrick Stolee
2023-05-23 23:17 ` [PATCH 2/2] repository: move 'repository_format_worktree_config' to repo scope Victoria Dye via GitGitGadget
2023-05-25  1:29   ` Glen Choo
2023-05-25 16:09     ` Glen Choo
2023-05-25 20:02       ` Victoria Dye
2023-05-25 20:13   ` Derrick Stolee
2023-05-24 10:25 ` [PATCH 0/2] Fix behavior of worktree config in submodules Junio C Hamano
2023-05-25 19:56 ` Glen Choo
2023-05-26  1:32 ` [PATCH v2 0/3] " Victoria Dye via GitGitGadget
2023-05-26  1:32   ` [PATCH v2 1/3] config: use gitdir to get worktree config Victoria Dye via GitGitGadget
2023-05-26  1:32   ` [PATCH v2 2/3] config: pass 'repo' directly to 'config_with_options()' Victoria Dye via GitGitGadget
2023-05-26  1:33   ` [PATCH v2 3/3] repository: move 'repository_format_worktree_config' to repo scope Victoria Dye via GitGitGadget
2023-05-31 22:17     ` Glen Choo
2023-06-01  4:43       ` Junio C Hamano
2023-06-12 21:37         ` Glen Choo
2023-06-07 22:29       ` Victoria Dye
2023-06-12 18:10         ` Glen Choo
2023-06-12 19:45           ` Victoria Dye
2023-06-12 20:23             ` Glen Choo
2023-06-12 23:04               ` [PATCH] setup: copy repository_format using helper Glen Choo
2023-06-13  0:03                 ` Victoria Dye [this message]
2023-06-13 18:25                   ` Glen Choo
2023-06-13 19:45                     ` Junio C Hamano
2023-05-26 15:48   ` [PATCH v2 0/3] Fix behavior of worktree config in submodules Derrick Stolee
2023-06-13 22:09   ` Glen Choo
2023-06-13 22:17     ` Victoria Dye

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9fb6d7b1-00b6-93ee-efec-9dd0ab91a66d@github.com \
    --to=vdye@github.com \
    --cc=chooglen@google.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.