git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Eric W. Biederman" <ebiederm@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: <git@vger.kernel.org>, "brian m. carlson" <sandals@crustytoothpaste.net>
Subject: Re: [PATCH] setup: Only allow extenions.objectFormat to be specified once
Date: Wed, 27 Sep 2023 08:11:05 -0500	[thread overview]
Message-ID: <87r0mjn4ly.fsf@gmail.froward.int.ebiederm.org> (raw)
In-Reply-To: <xmqqr0mkmx9b.fsf@gitster.g> (Junio C. Hamano's message of "Tue, 26 Sep 2023 14:37:36 -0700")

Junio C Hamano <gitster@pobox.com> writes:

> "Eric W. Biederman" <ebiederm@gmail.com> writes:
>
>> Today there is no sanity checking of what happens when
>> extensions.objectFormat is specified multiple times.  Catch confused git
>> configurations by only allowing this option to be specified once.
>
> Hmph.  I am not sure if this is worth doing, and especially only for
> "objectformat".  Do we intend to apply different rules other than
> "you can give it only once" to other extensions, and if so where
> will these rules be catalogued?  I do not see particular harm to let
> them follow the usual "last one wins".
>
> If the patch were about trying to make sure that extensions, which
> are inherentaly per-repository, appear only in $GIT_DIR/config and
> complain if the code gets confused and tried to read them from the
> system or global configuration files, I would understand, and
> strongly support such an effort, ithough.

Unless I misread something, the code only checks for [extensions]
in $GIT_DIR/config.

> The real sanity we want to enforce is that what is reported by
> running "git config extensions.objectformat" must match the object
> format that is used in refs and object database.

Agreed.  Allowing git config extensions.objectformat to change the
existing value is allowing the repository to be corrupted.

> Manually futzing
> the configuration file and adding an entry with a contradictory
> value certainly is one way to break that sanity, and this patch may
> catch such a breakage, but once we start worrying about manually
> futzing the configuration file, the check added here would easily
> miss if the futzing is done by replacing instead of adding, so I am
> not sure if this extra code is worth its bits.
>
> But perhaps I am missing something and not seeing why it is worth
> insisting on "last one is the first one" for this particular one.

I somewhat have blinders on.  There are 3 configuration options I am
concerned with:

extensions.objectFormat
extensions.compatObjectFormat
core.historicObjectFormat (or whatever name we settle on).

One key concern I heard expressed in earlier reviews is that however we
handle these options we handle them in such a way as to give ourselves
room to rise to challenges in the future.

Whatever we do with parsing we have the following logical
constraints:

For extensions.objectFormat: There can only be a single storage hash.

For extensions.compatObjectFormat: There can be no compatibility hash,
there can be a single compatibility hash, and depending how things go
between now and the next hash function transition we might want multiple
compatibility hashes.

For core.historicObjectFormat: There can be no historic hash function, there
can be a single historic hash function, there can be multiple historic
hash functions.


For the compatibility hash I think it is unlikely we will want to
support more than one compatibility hash in practice but I can imagine
a scenario where we just get into the transition from SHA-1 to SHA-256
and a serious break is discovered that requires switching to FutureHash
ASAP.

For historic object formats like SHA-1 will become post transition there
are references embedded in commit comments, email messages, bug
trackers.  All kinds of places that we can not update so there is
fundamentally a need to be able to find which current objects correspond
to the historic names.  For a project each hash function transition will
create more such objects.


When I looked I saw two ways within current git to specify a list of
values for a single configuration option.
- Give that option multiple times.
- Parse the option value in such a way as to generate a list.

It is my sense just specifying the compatObjectFormat multiple times to
specify multiple compatibility object formats makes the most sense.
Especially as all is needed today is to only allow a single value.

After I had implemented the only allow once logic for compatObjectFormat
I saw that objectFormat had nothing similar, and knowing it is a bug
for multiple objectFormat wrote a patch to enforce only appear once
for objectFormat as well.

For objectFormat I don't care very much.  For compatObjectFormat I truly
care, and for even more for the option that allows finding the current
object from a historic oid (even a truncated one) I care very much.

For me the fundamental question is if we allow multiples compatibility
hashes or historical hashes how do we specify them?  Have the option
appear more than once?  A comma separated list?

Whatever we decided I want to enforce that doesn't appear in current
configurations so we can support for multiples later.

Eric

>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  setup.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/setup.c b/setup.c
>> index 18927a847b86..ef9f79b8885e 100644
>> --- a/setup.c
>> +++ b/setup.c
>> @@ -580,6 +580,7 @@ static enum extension_result handle_extension(const char *var,
>>  	if (!strcmp(ext, "noop-v1")) {
>>  		return EXTENSION_OK;
>>  	} else if (!strcmp(ext, "objectformat")) {
>> +		struct string_list_item *item;
>>  		int format;
>>  
>>  		if (!value)
>> @@ -588,6 +589,13 @@ static enum extension_result handle_extension(const char *var,
>>  		if (format == GIT_HASH_UNKNOWN)
>>  			return error(_("invalid value for '%s': '%s'"),
>>  				     "extensions.objectformat", value);
>> +		/* Only support objectFormat being specified once. */
>> +		for_each_string_list_item(item, &data->v1_only_extensions) {
>> +			if (!strcmp(item->string, "objectformat"))
>> +				return error(_("'%s' already specified as '%s'"),
>> +					"extensions.objectformat",
>> +					hash_algos[data->hash_algo].name);
>> +		}
>>  		data->hash_algo = format;
>>  		return EXTENSION_OK;
>>  	}

  reply	other threads:[~2023-09-27 13:11 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-26 16:01 [PATCH] setup: Only allow extenions.objectFormat to be specified once Eric W. Biederman
2023-09-26 21:37 ` Junio C Hamano
2023-09-27 13:11   ` Eric W. Biederman [this message]
2023-09-27 19:56     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r0mjn4ly.fsf@gmail.froward.int.ebiederm.org \
    --to=ebiederm@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).