Git development
 help / color / mirror / Atom feed
* Re: [PATCH] mem-pool: fix big allocations
From: René Scharfe @ 2023-12-28 18:56 UTC (permalink / raw)
  To: phillip.wood, Git List; +Cc: Jameson Miller
In-Reply-To: <e1e43a6c-3e06-4453-88a3-f00476132bcd@gmail.com>

Am 28.12.23 um 17:48 schrieb phillip.wood123@gmail.com:
> On 28/12/2023 16:05, René Scharfe wrote:
>> Am 28.12.23 um 16:10 schrieb Phillip Wood:
>>> The diff at the end of
>>> this email shows a possible implementation of a check_ptr() macro for
>>> the unit test library. I'm wary of adding it though because I'm not sure
>>> printing the pointer values is actually very useful most of the
>>> time. I'm also concerned that the rules around pointer arithmetic and
>>> comparisons mean that many pointer tests such as
>>>
>>>      check_ptr(pool->mp_block->next_free, <=, pool->mp_block->end);
>>>
>>> will be undefined if they fail.
>>
>> True, the compiler could legally emit mush when it finds out that the
>> pointers are for different objects.  And the error being fixed produces
>> such unrelated pointer pairs -- oops.
>>
>> This check is not important here, we can just drop it.
>>
>> mem_pool_contains() has the same problem, by the way.
>>
>> Restricting ourselves to only equality comparisons for pointers prevents
>> some interesting sanity checks, though.  Casting to intptr_t or
>> uintptr_t would allow arbitrary comparisons without risk of undefined
>> behavior, though.  Perhaps that would make a check_ptr() macro viable
>> and useful.
>
> That certainly helps and the check_ptr() macro in my previous email
> casts the pointers to uintptr_t before comparing them. Maybe I'm
> worrying too much, but my concern is that in a failing comparison it
> is likely one of the pointers is invalid (for example it is the
> result of some undefined pointer arithmetic) and the program is
> undefined from the point the invalid pointer is created.

There are no restrictions on integer comparisons.  So comparing after
casting to uintptr_t should not invoke undefined behavior.  If undefined
behavior was involved in calculating the pointers in the first place
then the compiler might still legally go crazy, but not due to the
comparison.  Right?

Whether the result of a uintptr_t-cast comparison of pointers to
different objects is meaningful is a different question.  Hopefully
range checks are possible.

> The
> documentation for check_ptr() in my previous mail contains the
> following example
>
>     For example if `start` and `end` are pointers to the beginning and
>     end of an allocation and `offset` is an integer then
>
>         check_ptr(start + offset, <=, end)
>
>     is undefined when `offset` is larger than `end - start`. Rewriting
>     the comparison as
>
>         check_uint(offset, <=, end - start)
>
>     avoids undefined behavior when offset is too large, but is still
>     undefined if there is a bug that means `start` and `end` do not
>     point to the same allocation.

True, but in such a unit test we'd need additional checks verifying
that start and end belong to the same object.  Or perhaps use a
numerical size instead of an end pointer.

René

^ permalink raw reply

* [PATCH v2] mem-pool: fix big allocations
From: René Scharfe @ 2023-12-28 19:19 UTC (permalink / raw)
  To: Git List; +Cc: Jameson Miller, Phillip Wood, Elijah Newren, Junio C Hamano
In-Reply-To: <fa89d269-1a23-4ed6-bebc-30c0b629f444@web.de>

Memory pool allocations that require a new block and would fill at
least half of it are handled specially.  Before 158dfeff3d (mem-pool:
add life cycle management functions, 2018-07-02) they used to be
allocated outside of the pool.  This patch made mem_pool_alloc() create
a bespoke block instead, to allow releasing it when the pool gets
discarded.

Unfortunately mem_pool_alloc() returns a pointer to the start of such a
bespoke block, i.e. to the struct mp_block at its top.  When the caller
writes to it, the management information gets corrupted.  This affects
mem_pool_discard() and -- if there are no other blocks in the pool --
also mem_pool_alloc().

Return the payload pointer of bespoke blocks, just like for smaller
allocations, to protect the management struct.

Also update next_free to mark the block as full.  This is only strictly
necessary for the first allocated block, because subsequent ones are
inserted after the current block and never considered for further
allocations, but it's easier to just do it in all cases.

Add a basic unit test to demonstrate the issue by using
mem_pool_calloc() with a tiny block size, which forces the creation of a
bespoke block.

Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
Changes since v1:
- simply use check() instead of a custom check_ptr() macro
- drop unnecessary comparison of next_free and end pointers

Interdiff against v1:
  diff --git a/t/unit-tests/t-mem-pool.c b/t/unit-tests/t-mem-pool.c
  index 2295779b0b..a0d57df761 100644
  --- a/t/unit-tests/t-mem-pool.c
  +++ b/t/unit-tests/t-mem-pool.c
  @@ -1,8 +1,6 @@
   #include "test-lib.h"
   #include "mem-pool.h"

  -#define check_ptr(a, op, b) check_int(((a) op (b)), ==, 1)
  -
   static void setup_static(void (*f)(struct mem_pool *), size_t block_alloc)
   {
   	struct mem_pool pool = { .block_alloc = block_alloc };
  @@ -16,11 +14,10 @@ static void t_calloc_100(struct mem_pool *pool)
   	char *buffer = mem_pool_calloc(pool, 1, size);
   	for (size_t i = 0; i < size; i++)
   		check_int(buffer[i], ==, 0);
  -	if (!check_ptr(pool->mp_block, !=, NULL))
  +	if (!check(pool->mp_block != NULL))
   		return;
  -	check_ptr(pool->mp_block->next_free, <=, pool->mp_block->end);
  -	check_ptr(pool->mp_block->next_free, !=, NULL);
  -	check_ptr(pool->mp_block->end, !=, NULL);
  +	check(pool->mp_block->next_free != NULL);
  +	check(pool->mp_block->end != NULL);
   }

   int cmd_main(int argc, const char **argv)

 Makefile                  |  1 +
 mem-pool.c                |  6 +++---
 t/unit-tests/t-mem-pool.c | 31 +++++++++++++++++++++++++++++++
 3 files changed, 35 insertions(+), 3 deletions(-)
 create mode 100644 t/unit-tests/t-mem-pool.c

diff --git a/Makefile b/Makefile
index 88ba7a3c51..15990ff312 100644
--- a/Makefile
+++ b/Makefile
@@ -1340,6 +1340,7 @@ THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%

 UNIT_TEST_PROGRAMS += t-basic
+UNIT_TEST_PROGRAMS += t-mem-pool
 UNIT_TEST_PROGRAMS += t-strbuf
 UNIT_TEST_PROGS = $(patsubst %,$(UNIT_TEST_BIN)/%$X,$(UNIT_TEST_PROGRAMS))
 UNIT_TEST_OBJS = $(patsubst %,$(UNIT_TEST_DIR)/%.o,$(UNIT_TEST_PROGRAMS))
diff --git a/mem-pool.c b/mem-pool.c
index c34846d176..e8d976c3ee 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -99,9 +99,9 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len)

 	if (!p) {
 		if (len >= (pool->block_alloc / 2))
-			return mem_pool_alloc_block(pool, len, pool->mp_block);
-
-		p = mem_pool_alloc_block(pool, pool->block_alloc, NULL);
+			p = mem_pool_alloc_block(pool, len, pool->mp_block);
+		else
+			p = mem_pool_alloc_block(pool, pool->block_alloc, NULL);
 	}

 	r = p->next_free;
diff --git a/t/unit-tests/t-mem-pool.c b/t/unit-tests/t-mem-pool.c
new file mode 100644
index 0000000000..a0d57df761
--- /dev/null
+++ b/t/unit-tests/t-mem-pool.c
@@ -0,0 +1,31 @@
+#include "test-lib.h"
+#include "mem-pool.h"
+
+static void setup_static(void (*f)(struct mem_pool *), size_t block_alloc)
+{
+	struct mem_pool pool = { .block_alloc = block_alloc };
+	f(&pool);
+	mem_pool_discard(&pool, 0);
+}
+
+static void t_calloc_100(struct mem_pool *pool)
+{
+	size_t size = 100;
+	char *buffer = mem_pool_calloc(pool, 1, size);
+	for (size_t i = 0; i < size; i++)
+		check_int(buffer[i], ==, 0);
+	if (!check(pool->mp_block != NULL))
+		return;
+	check(pool->mp_block->next_free != NULL);
+	check(pool->mp_block->end != NULL);
+}
+
+int cmd_main(int argc, const char **argv)
+{
+	TEST(setup_static(t_calloc_100, 1024 * 1024),
+	     "mem_pool_calloc returns 100 zeroed bytes with big block");
+	TEST(setup_static(t_calloc_100, 1),
+	     "mem_pool_calloc returns 100 zeroed bytes with tiny block");
+
+	return test_done();
+}
--
2.43.0

^ permalink raw reply related

* Re: [PATCH 1/1] Replace SID with domain/username
From: Eric Sunshine @ 2023-12-28 19:27 UTC (permalink / raw)
  To: Sören Krecker; +Cc: git
In-Reply-To: <20231228132844.4240-2-soekkle@freenet.de>

On Thu, Dec 28, 2023 at 8:29 AM Sören Krecker <soekkle@freenet.de> wrote:
> From: soekkle <soekkle@freenet.de>
>
> Replace SID with domain/username in erromessage, if owner of repository
> and user are not equal on windows systems.
>
> Signed-off-by: Sören Krecker <soekkle@freenet.de>
> ---

I don't do Windows (anymore), thus I'm not qualified to comment on the
substance of this patch, so I'll just make some general, hopefully
helpful, observations.

Typo: "erromessage" should be "error message"

Your name in the "From:" header and Signed-off-by: should be the same.

Perhaps Widows folks can understand the purpose of this patch without
further explanation, but for other readers, it's not clear what
problem the patch is trying to solve. The commit message is a good
place to explain _why_ this change is desirable.

> diff --git a/compat/mingw.c b/compat/mingw.c
> @@ -2684,6 +2684,25 @@ static PSID get_current_user_sid(void)
> +BOOL user_sid_to_string(PSID sid, LPSTR* str)

In this codebase, '*' sticks to the variable name, not the type, so:

    BOOL user_sid_to_string(PSID sid, LPSTR *str)

> +{
> +       SID_NAME_USE peUse;
> +       DWORD lenName = { 0 }, lenDomain = { 0 };

Looking through compat/mingw.c, it appears that (as with the rest of
the project), variable names tend to use underscores rather than
camel-case, so for consistency these might be better expressed as
"pe_use" (whatever that means), "name_len", and "domain_len".

I was curious about the `{ 0 }` initializer. It seems we have a mix of
both `{0}` and `{ 0 }` in the codebase, so what you have here is
likely fine.

> +       LookupAccountSidA(NULL, sid, NULL, &lenName, NULL,
> +                                       &lenDomain, &peUse); // returns only FALSE, because the string pointers are NULL

As with the rest of the project, compat/mingw.c still shuns "//"
comments. Use /*...*/ comments instead.

> +       ALLOC_ARRAY((*str), (size_t)lenDomain + (size_t)lenName); // Alloc neded Space of the strings

Type: "neded" -> "needed"

(and "Space" -> "space")

> +       BOOL retVal = LookupAccountSidA(NULL, sid, (*str) + lenDomain, &lenName,
> +                                      *str,
> +                                       &lenDomain, &peUse);
> +       *(*str + lenDomain) = '/';
> +       if (retVal == FALSE)
> +       {
> +               free(*str);
> +               *str = NULL;

The FREE_AND_NULL() macro from git-compat-util.h is a good companion
to the ALLOC_ARRAY() macro used above, so freeing and nullifying could
be done in one line:

    FREE_AND_NULL(*str);

> +       }
> +       return retVal;
> +}

Perhaps a variable name such as `ok` would convey more to the reader
than the generic `retVal`?

^ permalink raw reply

* Re: [PATCH] mem-pool: fix big allocations
From: phillip.wood123 @ 2023-12-28 19:34 UTC (permalink / raw)
  To: René Scharfe, phillip.wood, Git List; +Cc: Jameson Miller
In-Reply-To: <34f5913f-b187-43c3-99b7-3d57065dba12@web.de>

On 28/12/2023 18:56, René Scharfe wrote:
> Am 28.12.23 um 17:48 schrieb phillip.wood123@gmail.com:
>> On 28/12/2023 16:05, René Scharfe wrote:
>>> Am 28.12.23 um 16:10 schrieb Phillip Wood:
>>>> The diff at the end of
>>>> this email shows a possible implementation of a check_ptr() macro for
>>>> the unit test library. I'm wary of adding it though because I'm not sure
>>>> printing the pointer values is actually very useful most of the
>>>> time. I'm also concerned that the rules around pointer arithmetic and
>>>> comparisons mean that many pointer tests such as
>>>>
>>>>       check_ptr(pool->mp_block->next_free, <=, pool->mp_block->end);
>>>>
>>>> will be undefined if they fail.
>>>
>>> True, the compiler could legally emit mush when it finds out that the
>>> pointers are for different objects.  And the error being fixed produces
>>> such unrelated pointer pairs -- oops.
>>>
>>> This check is not important here, we can just drop it.
>>>
>>> mem_pool_contains() has the same problem, by the way.
>>>
>>> Restricting ourselves to only equality comparisons for pointers prevents
>>> some interesting sanity checks, though.  Casting to intptr_t or
>>> uintptr_t would allow arbitrary comparisons without risk of undefined
>>> behavior, though.  Perhaps that would make a check_ptr() macro viable
>>> and useful.
>>
>> That certainly helps and the check_ptr() macro in my previous email
>> casts the pointers to uintptr_t before comparing them. Maybe I'm
>> worrying too much, but my concern is that in a failing comparison it
>> is likely one of the pointers is invalid (for example it is the
>> result of some undefined pointer arithmetic) and the program is
>> undefined from the point the invalid pointer is created.
> 
> There are no restrictions on integer comparisons.  So comparing after
> casting to uintptr_t should not invoke undefined behavior.  If undefined
> behavior was involved in calculating the pointers in the first place
> then the compiler might still legally go crazy, but not due to the
> comparison.  Right?

Exactly, my worry is that if the comparison fails it is likely that 
there will have been undefined behavior involved in calculating the 
pointer before we get to the comparison in which case so casting to 
uintptr_t in the comparison does not help.

> Whether the result of a uintptr_t-cast comparison of pointers to
> different objects is meaningful is a different question.  Hopefully
> range checks are possible.
> 
>> The
>> documentation for check_ptr() in my previous mail contains the
>> following example
>>
>>      For example if `start` and `end` are pointers to the beginning and
>>      end of an allocation and `offset` is an integer then
>>
>>          check_ptr(start + offset, <=, end)
>>
>>      is undefined when `offset` is larger than `end - start`. Rewriting
>>      the comparison as
>>
>>          check_uint(offset, <=, end - start)
>>
>>      avoids undefined behavior when offset is too large, but is still
>>      undefined if there is a bug that means `start` and `end` do not
>>      point to the same allocation.
> 
> True, but in such a unit test we'd need additional checks verifying
> that start and end belong to the same object.  Or perhaps use a
> numerical size instead of an end pointer.

Agreed, but I think the implication is that there will be cases we 
should be using check_uint() as in the second comparison above rather 
than check_ptr() as in the first comparison above. I'm not opposed to 
adding check_ptr() if we think it will be useful but I am worried it is 
easy to misuse it. If we do add check_ptr() we should have some 
guidelines about when it makes sense to use it.

Best Wishes

Phillip

^ permalink raw reply

* Re: [PATCH v2] mem-pool: fix big allocations
From: phillip.wood123 @ 2023-12-28 19:36 UTC (permalink / raw)
  To: René Scharfe, Git List
  Cc: Jameson Miller, Phillip Wood, Elijah Newren, Junio C Hamano
In-Reply-To: <1c39c0e7-05b2-4726-a90c-f78df4356a41@web.de>

Hi René

On 28/12/2023 19:19, René Scharfe wrote:
> Interdiff against v1:
>    diff --git a/t/unit-tests/t-mem-pool.c b/t/unit-tests/t-mem-pool.c
>    index 2295779b0b..a0d57df761 100644
>    --- a/t/unit-tests/t-mem-pool.c
>    +++ b/t/unit-tests/t-mem-pool.c
>    @@ -1,8 +1,6 @@
>     #include "test-lib.h"
>     #include "mem-pool.h"
> 
>    -#define check_ptr(a, op, b) check_int(((a) op (b)), ==, 1)
>    -
>     static void setup_static(void (*f)(struct mem_pool *), size_t block_alloc)
>     {
>     	struct mem_pool pool = { .block_alloc = block_alloc };
>    @@ -16,11 +14,10 @@ static void t_calloc_100(struct mem_pool *pool)
>     	char *buffer = mem_pool_calloc(pool, 1, size);
>     	for (size_t i = 0; i < size; i++)
>     		check_int(buffer[i], ==, 0);
>    -	if (!check_ptr(pool->mp_block, !=, NULL))
>    +	if (!check(pool->mp_block != NULL))
>     		return;
>    -	check_ptr(pool->mp_block->next_free, <=, pool->mp_block->end);
>    -	check_ptr(pool->mp_block->next_free, !=, NULL);
>    -	check_ptr(pool->mp_block->end, !=, NULL);
>    +	check(pool->mp_block->next_free != NULL);
>    +	check(pool->mp_block->end != NULL);
>     }

The changes to the unit tests look good to me (I haven't really looked 
at the actual bug fix in the mem_pool code).

Best Wishes

Phillip

^ permalink raw reply

* Re: [PATCH 0/6] worktree: initialize refdb via ref backends
From: Patrick Steinhardt @ 2023-12-28 19:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqedf6gpt8.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 2121 bytes --]

On Thu, Dec 28, 2023 at 10:11:31AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > when initializing worktrees we manually create the on-disk data
> > structures required for the ref backend in "worktree.c". This works just
> > fine right now where we only have a single user-exposed ref backend, but
> > it will become unwieldy once we have multiple ref backends. This patch
> > series thus refactors how we initialize worktrees so that we can use
> > `refs_init_db()` to initialize required files for us.
> >
> > This patch series conflicts with ps/refstorage-extension. The conflict
> > can be solved as shown below. I'm happy to defer this patch series
> > though until the topic has landed on `master` in case this causes
> > issues.
> 
> Resolution is not all that bad, but the change in function signature
> means comments/explanations near both the caller and the callee of
> the get_linked_worktree() function may need updating, I would think.
> For example, ...
> 
> > diff --git a/worktree.h b/worktree.h
> > index 8a75691eac..f14784a2ff 100644
> > --- a/worktree.h
> > +++ b/worktree.h
> > @@ -61,7 +61,8 @@ struct worktree *find_worktree(struct worktree **list,
> >   * Look up the worktree corresponding to `id`, or NULL of no such worktree
> >   * exists.
> >   */
> > -struct worktree *get_linked_worktree(const char *id);
> > +struct worktree *get_linked_worktree(const char *id,
> > +				     int skip_reading_head);
> 
> ... this now needs to help developers who may want to add new
> callers what to pass in "skip_reading_head" and why.
> 
> We may indeed want to build this on top of the refstorage-extansion
> thing, as it seems to be relatively close to completion.

Fair enough. I'll wait for the refstorage extension topic to hit `next`
or `master` first so as to not build deep dependency chains when things
may still move around. I don't mind waiting another one or two weeks,
especially during holidays where things are moving slower anyway.

> Thanks (and a happy new year).

Thanks, the same to you, too.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 04/12] setup: start tracking ref storage format when
From: Patrick Steinhardt @ 2023-12-28 20:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqplyqgsem.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]

On Thu, Dec 28, 2023 at 09:15:29AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Makes me wonder whether we should then also add the following diff to
> > "setup: set repository's format on init" when both topics are being
> > merged together:
> >
> > diff --git a/setup.c b/setup.c
> > index 3d980814bc..3d35c78c68 100644
> > --- a/setup.c
> > +++ b/setup.c
> > @@ -2210,6 +2210,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
> >  	 * format we can update the repository's settings accordingly.
> >  	 */
> >  	repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
> > +	repo_set_compat_hash_algo(the_repository, repo_fmt.compat_hash_algo);
> >  	repo_set_ref_storage_format(the_repository, repo_fmt.ref_storage_format);
> >  
> >  	if (!(flags & INIT_DB_SKIP_REFDB))
> 
> Shouldn't that come from the series that wants .compat_hash_algo in
> the repo_fmt structure, whichever it is, not added by an evil merge?

Well, the above code is newly added by my series to ensure that
`init_db()` results in a properly initialized repo upon return. So the
compat hash algo series cannot yet call `repo_set_compat_hash_algo()`
because the code site doesn't exist, whereas my series cannot yet add
the call because there is no compat hash algo yet.

So depending on which series lands first we'll either have to adapt the
respective other series or do an evil merge.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 03/12] refs: refactor logic to look up storage backends
From: Patrick Steinhardt @ 2023-12-28 20:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Karthik Nayak
In-Reply-To: <xmqqjzoygrx8.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 2650 bytes --]

On Thu, Dec 28, 2023 at 09:25:55AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > In order to look up ref storage backends, we're currently using a linked
> > list of backends, where each backend is expected to set up its `next`
> > pointer to the next ref storage backend. This is kind of a weird setup
> > as backends need to be aware of other backends without much of a reason.
> >
> > Refactor the code so that the array of backends is centrally defined in
> > "refs.c", where each backend is now identified by an integer constant.
> > Expose functions to translate from those integer constants to the name
> > and vice versa, which will be required by subsequent patches.
> 
> A small question.  Does this have to be "int", or is "unsigned" (or
> even an enum, rewrittenfrom the "REF_STORAGE_FORMAT_*" family of CPP
> macro constants) good enough?  I am only wondering what happens when
> you clal find_ref_storage_backend() with a negative index.

No, it does not have to be an `int`, and handling a negative index would
be a bug. I tried to stick to what we have with `GIT_HASH_UNKNOWN`,
`GIT_HASH_SHA1` etc, which is exactly similar in spirit. Whether it's
the perfect way to handle this... probably not. Without the context I
would've used an `enum`, but instead I opted for consistency.

> For that matter, how REF_STORAGE_FORMAT_UNKNOWN (whose value is 0)
> is handled by the function also gets curious.  The caller may have
> to find that the backend hasn't been specified by receiving an
> element in the refs_backends[] array that corresponds to it, but the
> error behaviour of this function is also to return NULL, so it has
> to be prepared to handle both cases?

Yeah, we do not really discern those two cases for now and instead just
return `NULL` both for any unknown ref storage format. All callers know
to handle `NULL`, but the error handling will only report a generic
"unknown" backend error.

The easiest way to discern those cases would be to `BUG()` when being
passed an invalid ref storage format smaller than 0 or larger than the
number of known backends. Because ultimately it is just that, a bug that
shouldn't ever occur.

Not sure whether this is worth a reroll?

Patrick

> > +static const struct ref_storage_be *refs_backends[] = {
> > +	[REF_STORAGE_FORMAT_FILES] = &refs_be_files,
> > +};
> > ...
> > +static const struct ref_storage_be *find_ref_storage_backend(int ref_storage_format)
> >  {
> > +	if (ref_storage_format < ARRAY_SIZE(refs_backends))
> > +		return refs_backends[ref_storage_format];
> >  	return NULL;
> >  }

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 02/12] worktree: skip reading HEAD when repairing worktrees
From: Patrick Steinhardt @ 2023-12-28 20:18 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: git, Karthik Nayak, Junio C Hamano
In-Reply-To: <CAPig+cSKpzOCOzC_mtNoA4yYmHCtMxB-Ujsd7YYHK-SPJvgt8w@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1132 bytes --]

On Thu, Dec 28, 2023 at 01:13:04PM -0500, Eric Sunshine wrote:
> On Thu, Dec 28, 2023 at 1:08 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> > Having said all that, I'm not overly opposed to this patch, especially
> > since your main focus is on getting the reftable backend integrated,
> > and because the changes (and ugliness) introduced by this patch are
> > entirely self-contained and private to worktree.c, so are not a
> > show-stopper by any means. Rather, I wanted to get down to writing
> > what I think would be a better future approach if someone gets around
> > to tackling it. (There is no pressing need at the moment, and that
> > someone doesn't have to be you.)
> 
> I forgot to mention that, if you reroll for some reason, the
> get_worktrees()/get_worktrees_internal() dance might deserve an
> in-source NEEDSWORK comment explaining that get_worktrees_internal()
> exists to work around the shortcoming that a corruption-tolerant
> function for retrieving worktree metadata (for use by the "repair"
> function) does not yet exist.

Thanks for sharing your thoughts, will do.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v4] sideband.c: remove redundant 'NEEDSWORK' tag
From: Junio C Hamano @ 2023-12-28 20:33 UTC (permalink / raw)
  To: Chandra Pratap via GitGitGadget
  Cc: git, Torsten Bögershausen, Chandra Pratap, Chandra Pratap
In-Reply-To: <pull.1625.v4.git.1703750460527.gitgitgadget@gmail.com>

"Chandra Pratap via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Subject: Re: [PATCH v4] sideband.c: remove redundant 'NEEDSWORK' tag

The reason for removal is not that it was redundant and we said the
same thing elsewhere.  Rather, what it claimed to be necessary has
turned to be unwanted.  So something like

    Subject: sideband.c: update stale NEEDSWORK comment

    If we really wanted to change the type of the parameter to this
    function to "size_t", we should also update its callers to hold
    the values they use to compute the parameter also in "size_t".

    But in this callchain, "int" is wide enough.  Avoid tempting
    future developers into wasting their time on using "size_t"
    around this function.

or along that line would be more appropriate, perhaps?

Thanks.

> From: Chandra Pratap <chandrapratap3519@gmail.com>
>
> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
> ---
>     sideband.c: replace int with size_t for clarity
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1625%2FChand-ra%2Fdusra-v4
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1625/Chand-ra/dusra-v4
> Pull-Request: https://github.com/gitgitgadget/git/pull/1625
>
> Range-diff vs v3:
>
>  1:  273415aa6a4 ! 1:  8c003256e5b sideband.c: remove redundant 'NEEDSWORK' tag
>      @@ sideband.c: void list_config_color_sideband_slots(struct string_list *list, cons
>         *
>       - * NEEDSWORK: use "size_t n" instead for clarity.
>       + * It is fine to use "int n" here instead of "size_t n" as all calls to this
>      -+ * function pass an 'int' parameter.
>      ++ * function pass an 'int' parameter. Additionally, the buffer involved in
>      ++ * storing these 'int' values takes input from a packet via the pkt-line
>      ++ * interface, which is capable of transferring only 64kB at a time.
>         */
>        static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n)
>        {
>
>
>  sideband.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/sideband.c b/sideband.c
> index 6cbfd391c47..266a67342be 100644
> --- a/sideband.c
> +++ b/sideband.c
> @@ -69,7 +69,10 @@ void list_config_color_sideband_slots(struct string_list *list, const char *pref
>   * of the line. This should be called for a single line only, which is
>   * passed as the first N characters of the SRC array.
>   *
> - * NEEDSWORK: use "size_t n" instead for clarity.
> + * It is fine to use "int n" here instead of "size_t n" as all calls to this
> + * function pass an 'int' parameter. Additionally, the buffer involved in
> + * storing these 'int' values takes input from a packet via the pkt-line
> + * interface, which is capable of transferring only 64kB at a time.
>   */
>  static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n)
>  {
>
> base-commit: 1a87c842ece327d03d08096395969aca5e0a6996

^ permalink raw reply

* Re: [PATCH v2 03/12] refs: refactor logic to look up storage backends
From: Junio C Hamano @ 2023-12-28 20:42 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak
In-Reply-To: <ZY3Wcua6dtzO2jG4@framework>

Patrick Steinhardt <ps@pks.im> writes:

> Yeah, we do not really discern those two cases for now and instead just
> return `NULL` both for any unknown ref storage format. All callers know
> to handle `NULL`, but the error handling will only report a generic
> "unknown" backend error.
>
> The easiest way to discern those cases would be to `BUG()` when being
> passed an invalid ref storage format smaller than 0 or larger than the
> number of known backends. Because ultimately it is just that, a bug that
> shouldn't ever occur.
>
> Not sure whether this is worth a reroll?

By using an unsigned type, you no longer have to worry about getting
handed a negative index, as the "must be smaller than ARRAY_SIZE()"
check will be sufficient to catch anybody who passes "-1" (casted to
unsigned by parameter passing).  So I would say that would be a good
enough reason to reroll, whether we differentiate 0 and an index
that is larger than refs_backends[] (or a negative one) with an
explicit BUG(), or just leave it to the caller by returning NULL.
As to the error handling, I suspect it is sufficient to return NULL
and let the caller handle it.

Thanks.


>
> Patrick
>
>> > +static const struct ref_storage_be *refs_backends[] = {
>> > +	[REF_STORAGE_FORMAT_FILES] = &refs_be_files,
>> > +};
>> > ...
>> > +static const struct ref_storage_be *find_ref_storage_backend(int ref_storage_format)
>> >  {
>> > +	if (ref_storage_format < ARRAY_SIZE(refs_backends))
>> > +		return refs_backends[ref_storage_format];
>> >  	return NULL;
>> >  }

^ permalink raw reply

* Re: [PATCH v5 2/3] trailer: find the end of the log message
From: Linus Arver @ 2023-12-29  6:42 UTC (permalink / raw)
  To: Junio C Hamano, Linus Arver via GitGitGadget
  Cc: git, Glen Choo, Christian Couder, Phillip Wood, Jonathan Tan
In-Reply-To: <xmqqr0lpoue3.fsf@gitster.g>


TL;DR: I'm working on a new approach.

Junio C Hamano <gitster@pobox.com> writes:
> Other than that, I didn't find anything quesionable in any of the
> patches in this round.  Looking good.

So actually, I'm now taking a much more aggressive approach to libifying
the trailer subsystem. Instead of incrementally simplifying/improving
things as in this series, I think I need to get to the root problem,
which is that the trailer.h API isn't rich enough to make it pleasant
for clients to use, including our own builtin/interpret-trailers.c
client. That is, the problem we have today is that the trailer subsystem
is not very ergonomic for internal use, much less external use (outside
of Git itself).

As an example, the current API exposes process_trailers() which does a
whole bunch of things that only builtin/interpret-trailers.c cares
about. Multiple other clients of trailer.h exist in our codebase (e.g.,
sequencer.c, pretty.c, ref-filter.c) but none of them use
process_trailers().

One really useful data structure is the trailer_iterator that was
introduced in f0939a0eb1 (trailer: add interface for iterating over
commit trailers, 2020-09-27). The only problem is that it is not generic
enough such that interpret-trailers.c can use it.

My new goal is to introduce a new API in trailer.h so that
interpret-trailers.c and everyone else can start using these new data
structures and associated functions (while preserving the
trailer_iterator interface). So the order of operations should be:

(1) enrich the trailer API (make trailer.h have simpler data structures
    and practical functions that clients can readily use), and
(2) make builtin/interpret-trailers.c, and other clients in the Git
    codebase use this new API.

This way when the unit test framework selection process is finalized we
can

(3) write unit tests for the functions in the (enriched) trailer API,

which is one of the major goals for my efforts around this area.

The work I've started locally for (1) does not depend on this series,
and I think it'll be cleaner (less churn) that way. So, feel free to
drop this series in favor of the forthcoming work described in this
message.

Thanks.

^ permalink raw reply

* Re: [PATCH] mem-pool: fix big allocations
From: René Scharfe @ 2023-12-29  6:53 UTC (permalink / raw)
  To: phillip.wood, Git List; +Cc: Jameson Miller
In-Reply-To: <48821d3f-2e30-4bce-b9e8-e4199c24e251@gmail.com>

Am 28.12.23 um 20:34 schrieb phillip.wood123@gmail.com:
>
> Exactly, my worry is that if the comparison fails it is likely that
> there will have been undefined behavior involved in calculating the
> pointer before we get to the comparison in which case so casting to
> uintptr_t in the comparison does not help.
If there's undefined behavior (UB) somewhere in the test or the tested
unit then the compiler could skip any checks and report success anyway.
Not adding UB in the test framework and tests is the least we can do.

Perhaps disabling link-time optimization would allow us to shield the
unit tests from UB in the tested code, in the sense that the compiler
would then not be able to skip checks.

René

^ permalink raw reply

* [PATCH v3 00/12] Introduce `refStorage` extension
From: Patrick Steinhardt @ 2023-12-29  7:26 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Eric Sunshine
In-Reply-To: <cover.1703067989.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 15642 bytes --]

Hi,

this is the third version of my patch series that introduces the new
`refStorage` extension. This extension will be used for the upcoming
reftable backend.

Changes compared to v3:

  - The `ref_storage_format` is now tracked as an `unsigned int`,
    proposed by Junio.

  - Reworded the commit message in patch 2, proposed by Eric.

  - Added a NEEDSWORK comment to `get_worktrees_internal()`, propose by
    Eric.

Thanks for your reviews!

Patrick

Patrick Steinhardt (12):
  t: introduce DEFAULT_REPO_FORMAT prereq
  worktree: skip reading HEAD when repairing worktrees
  refs: refactor logic to look up storage backends
  setup: start tracking ref storage format
  setup: set repository's formats on init
  setup: introduce "extensions.refStorage" extension
  setup: introduce GIT_DEFAULT_REF_FORMAT envvar
  t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar
  builtin/rev-parse: introduce `--show-ref-format` flag
  builtin/init: introduce `--ref-format=` value flag
  builtin/clone: introduce `--ref-format=` value flag
  t9500: write "extensions.refstorage" into config

 Documentation/config/extensions.txt           | 11 +++
 Documentation/git-clone.txt                   |  6 ++
 Documentation/git-init.txt                    |  7 ++
 Documentation/git-rev-parse.txt               |  3 +
 Documentation/git.txt                         |  5 ++
 Documentation/ref-storage-format.txt          |  1 +
 .../technical/repository-version.txt          |  5 ++
 builtin/clone.c                               | 17 ++++-
 builtin/init-db.c                             | 15 +++-
 builtin/rev-parse.c                           |  4 ++
 refs.c                                        | 34 ++++++---
 refs.h                                        |  3 +
 refs/debug.c                                  |  1 -
 refs/files-backend.c                          |  1 -
 refs/packed-backend.c                         |  1 -
 refs/refs-internal.h                          |  1 -
 repository.c                                  |  6 ++
 repository.h                                  |  7 ++
 setup.c                                       | 66 +++++++++++++++--
 setup.h                                       | 10 ++-
 t/README                                      |  3 +
 t/t0001-init.sh                               | 70 +++++++++++++++++++
 t/t1500-rev-parse.sh                          | 17 +++++
 t/t3200-branch.sh                             |  2 +-
 t/t5601-clone.sh                              | 17 +++++
 t/t9500-gitweb-standalone-no-errors.sh        |  5 ++
 t/test-lib-functions.sh                       |  5 ++
 t/test-lib.sh                                 | 15 +++-
 worktree.c                                    | 31 +++++---
 29 files changed, 334 insertions(+), 35 deletions(-)
 create mode 100644 Documentation/ref-storage-format.txt

Range-diff against v2:
 1:  3613439cb7 =  1:  578deaabcf t: introduce DEFAULT_REPO_FORMAT prereq
 2:  ecf4f1ddee !  2:  77c7213c66 worktree: skip reading HEAD when repairing worktrees
    @@ Commit message
         is logic that we resolve their respective worktree HEADs, even though
         that information may not actually be needed in the end by all callers.
     
    -    In the context of git-init(1) this is about to become a problem, because
    -    we do not have a repository that was set up via `setup_git_directory()`
    -    or friends. Consequentially, it is not yet fully initialized at the time
    -    of calling `repair_worktrees()`, and properly setting up all parts of
    -    the repository in `init_db()` before we repair worktrees is not an easy
    -    thing to do. While this is okay right now where we only have a single
    -    reference backend in Git, once we gain a second one we would be trying
    -    to look up the worktree HEADs before we have figured out the reference
    -    format, which does not work.
    +    Although not a problem presently with the file-based reference backend,
    +    it will become a problem with the upcoming reftable backend. In the
    +    context of git-init(1) we do not have a fully-initialized repository set
    +    up via `setup_git_directory()` or friends. Consequently, we do not know
    +    about the repository format when `repair_worktrees()` is called, and
    +    properly setting up all parts of the repositroy in `init_db()` before we
    +    try to repair worktrees is not an easy task. With the introduction of
    +    the reftable backend, we would ultimately try to look up the worktree
    +    HEADs before we have figured out the reference format, which does not
    +    work.
     
         We do not require the worktree HEADs at all to repair worktrees. So
         let's fix this issue by skipping over the step that reads them.
    @@ worktree.c: static void mark_current_worktree(struct worktree **worktrees)
      }
      
     -struct worktree **get_worktrees(void)
    ++/*
    ++ * NEEDSWORK: This function exists so that we can look up metadata of a
    ++ * worktree without trying to access any of its internals like the refdb. It
    ++ * would be preferable to instead have a corruption-tolerant function for
    ++ * retrieving worktree metadata that could be used when the worktree is known
    ++ * to not be in a healthy state, e.g. when creating or repairing it.
    ++ */
     +static struct worktree **get_worktrees_internal(int skip_reading_head)
      {
      	struct worktree **list = NULL;
 3:  12329b99b7 !  3:  47649570bf refs: refactor logic to look up storage backends
    @@ refs.c
     +};
      
     -static struct ref_storage_be *find_ref_storage_backend(const char *name)
    -+static const struct ref_storage_be *find_ref_storage_backend(int ref_storage_format)
    ++static const struct ref_storage_be *find_ref_storage_backend(unsigned int ref_storage_format)
      {
     -	struct ref_storage_be *be;
     -	for (be = refs_backends; be; be = be->next)
    @@ refs.c
      	return NULL;
      }
      
    -+int ref_storage_format_by_name(const char *name)
    ++unsigned int ref_storage_format_by_name(const char *name)
     +{
    -+	for (int i = 0; i < ARRAY_SIZE(refs_backends); i++)
    ++	for (unsigned int i = 0; i < ARRAY_SIZE(refs_backends); i++)
     +		if (refs_backends[i] && !strcmp(refs_backends[i]->name, name))
     +			return i;
     +	return REF_STORAGE_FORMAT_UNKNOWN;
     +}
     +
    -+const char *ref_storage_format_to_name(int ref_storage_format)
    ++const char *ref_storage_format_to_name(unsigned int ref_storage_format)
     +{
     +	const struct ref_storage_be *be = find_ref_storage_backend(ref_storage_format);
     +	if (!be)
    @@ refs.c: static struct ref_store *ref_store_init(struct repository *repo,
      {
     -	const char *be_name = "files";
     -	struct ref_storage_be *be = find_ref_storage_backend(be_name);
    -+	int format = REF_STORAGE_FORMAT_FILES;
    ++	unsigned int format = REF_STORAGE_FORMAT_FILES;
     +	const struct ref_storage_be *be = find_ref_storage_backend(format);
      	struct ref_store *refs;
      
    @@ refs.h: struct string_list;
      struct string_list_item;
      struct worktree;
      
    -+int ref_storage_format_by_name(const char *name);
    -+const char *ref_storage_format_to_name(int ref_storage_format);
    ++unsigned int ref_storage_format_by_name(const char *name);
    ++const char *ref_storage_format_to_name(unsigned int ref_storage_format);
     +
      /*
       * Resolve a reference, recursively following symbolic refererences.
 4:  ddd099fbaf !  4:  837764d0b5 setup: start tracking ref storage format
    @@ refs.c: static struct ref_store *ref_store_init(struct repository *repo,
      					const char *gitdir,
      					unsigned int flags)
      {
    --	int format = REF_STORAGE_FORMAT_FILES;
    +-	unsigned int format = REF_STORAGE_FORMAT_FILES;
     -	const struct ref_storage_be *be = find_ref_storage_backend(format);
     +	const struct ref_storage_be *be;
      	struct ref_store *refs;
    @@ repository.c: void repo_set_hash_algo(struct repository *repo, int hash_algo)
      	repo->hash_algo = &hash_algos[hash_algo];
      }
      
    -+void repo_set_ref_storage_format(struct repository *repo, int format)
    ++void repo_set_ref_storage_format(struct repository *repo, unsigned int format)
     +{
     +	repo->ref_storage_format = format;
     +}
    @@ repository.h: struct repository {
      	const struct git_hash_algo *hash_algo;
      
     +	/* Repository's reference storage format, as serialized on disk. */
    -+	int ref_storage_format;
    ++	unsigned int ref_storage_format;
     +
      	/* A unique-id for tracing purposes. */
      	int trace2_repo_id;
    @@ repository.h: void repo_set_gitdir(struct repository *repo, const char *root,
      		     const struct set_gitdir_args *extra_args);
      void repo_set_worktree(struct repository *repo, const char *path);
      void repo_set_hash_algo(struct repository *repo, int algo);
    -+void repo_set_ref_storage_format(struct repository *repo, int format);
    ++void repo_set_ref_storage_format(struct repository *repo, unsigned int format);
      void initialize_the_repository(void);
      RESULT_MUST_BE_USED
      int repo_init(struct repository *r, const char *gitdir, const char *worktree);
    @@ setup.c: static int is_reinit(void)
      }
      
     -void create_reference_database(const char *initial_branch, int quiet)
    -+void create_reference_database(int ref_storage_format,
    ++void create_reference_database(unsigned int ref_storage_format,
     +			       const char *initial_branch, int quiet)
      {
      	struct strbuf err = STRBUF_INIT;
    @@ setup.c: static void validate_hash_algorithm(struct repository_format *repo_fmt,
      	}
      }
      
    -+static void validate_ref_storage_format(struct repository_format *repo_fmt, int format)
    ++static void validate_ref_storage_format(struct repository_format *repo_fmt,
    ++					unsigned int format)
     +{
     +	if (repo_fmt->version >= 0 &&
     +	    format != REF_STORAGE_FORMAT_UNKNOWN &&
    @@ setup.c: static void validate_hash_algorithm(struct repository_format *repo_fmt,
      int init_db(const char *git_dir, const char *real_git_dir,
     -	    const char *template_dir, int hash, const char *initial_branch,
     +	    const char *template_dir, int hash,
    -+	    int ref_storage_format, const char *initial_branch,
    ++	    unsigned int ref_storage_format,
    ++	    const char *initial_branch,
      	    int init_shared_repository, unsigned int flags)
      {
      	int reinit;
    @@ setup.h: struct repository_format {
      	int worktree_config;
      	int is_bare;
      	int hash_algo;
    -+	int ref_storage_format;
    ++	unsigned int ref_storage_format;
      	int sparse_index;
      	char *work_tree;
      	struct string_list unknown_extensions;
    @@ setup.h: void check_repository_format(struct repository_format *fmt);
      
      int init_db(const char *git_dir, const char *real_git_dir,
      	    const char *template_dir, int hash_algo,
    -+	    int ref_storage_format,
    ++	    unsigned int ref_storage_format,
      	    const char *initial_branch, int init_shared_repository,
      	    unsigned int flags);
      void initialize_repository_version(int hash_algo, int reinit);
     -void create_reference_database(const char *initial_branch, int quiet);
    -+void create_reference_database(int ref_storage_format,
    ++void create_reference_database(unsigned int ref_storage_format,
     +			       const char *initial_branch, int quiet);
      
      /*
 5:  01a1e58a97 =  5:  a51da56d9b setup: set repository's formats on init
 6:  0a586fa648 !  6:  a1e03e4392 setup: introduce "extensions.refStorage" extension
    @@ setup.c: static enum extension_result handle_extension(const char *var,
      		data->hash_algo = format;
      		return EXTENSION_OK;
     +	} else if (!strcmp(ext, "refstorage")) {
    -+		int format;
    ++		unsigned int format;
     +
     +		if (!value)
     +			return config_error_nonbool(var);
    @@ setup.c: static int needs_work_tree_config(const char *git_dir, const char *work
      }
      
     -void initialize_repository_version(int hash_algo, int reinit)
    -+void initialize_repository_version(int hash_algo, int ref_storage_format,
    ++void initialize_repository_version(int hash_algo,
    ++				   unsigned int ref_storage_format,
     +				   int reinit)
      {
      	char repo_version_string[10];
    @@ setup.c: static int create_default_files(const char *template_path,
     
      ## setup.h ##
     @@ setup.h: int init_db(const char *git_dir, const char *real_git_dir,
    - 	    int ref_storage_format,
    + 	    unsigned int ref_storage_format,
      	    const char *initial_branch, int init_shared_repository,
      	    unsigned int flags);
     -void initialize_repository_version(int hash_algo, int reinit);
    -+void initialize_repository_version(int hash_algo, int ref_storage_format,
    ++void initialize_repository_version(int hash_algo,
    ++				   unsigned int ref_storage_format,
     +				   int reinit);
    - void create_reference_database(int ref_storage_format,
    + void create_reference_database(unsigned int ref_storage_format,
      			       const char *initial_branch, int quiet);
      
     
 7:  6d8754f73a !  7:  5ffc70e9be setup: introduce GIT_DEFAULT_REF_FORMAT envvar
    @@ Documentation/git.txt: double-quotes and respecting backslash escapes. E.g., the
     
      ## setup.c ##
     @@ setup.c: static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
    - 
    - static void validate_ref_storage_format(struct repository_format *repo_fmt, int format)
    + static void validate_ref_storage_format(struct repository_format *repo_fmt,
    + 					unsigned int format)
      {
     +	const char *name = getenv("GIT_DEFAULT_REF_FORMAT");
     +
 8:  c645932f3d =  8:  13c074acdf t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar
 9:  761d647770 =  9:  4ee3c9a2d1 builtin/rev-parse: introduce `--show-ref-format` flag
10:  e382b5bf08 ! 10:  25773e3560 builtin/init: introduce `--ref-format=` value flag
    @@ builtin/init-db.c: int cmd_init_db(int argc, const char **argv, const char *pref
     +	const char *ref_format = NULL;
      	const char *initial_branch = NULL;
      	int hash_algo = GIT_HASH_UNKNOWN;
    -+	int ref_storage_format = REF_STORAGE_FORMAT_UNKNOWN;
    ++	unsigned int ref_storage_format = REF_STORAGE_FORMAT_UNKNOWN;
      	int init_shared_repository = -1;
      	const struct option init_db_options[] = {
      		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
11:  257233658d ! 11:  3f1cb6b9e5 builtin/clone: introduce `--ref-format=` value flag
    @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
      	int submodule_progress;
      	int filter_submodules = 0;
      	int hash_algo;
    -+	int ref_storage_format = REF_STORAGE_FORMAT_UNKNOWN;
    ++	unsigned int ref_storage_format = REF_STORAGE_FORMAT_UNKNOWN;
      	const int do_not_override_repo_unix_permissions = -1;
      
      	struct transport_ls_refs_options transport_ls_refs_options =
12:  b8cd06ec53 = 12:  2e7682b2f3 t9500: write "extensions.refstorage" into config

base-commit: e79552d19784ee7f4bbce278fe25f93fbda196fa
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v3 01/12] t: introduce DEFAULT_REPO_FORMAT prereq
From: Patrick Steinhardt @ 2023-12-29  7:26 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Eric Sunshine
In-Reply-To: <cover.1703833818.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1587 bytes --]

A limited number of tests require repositories to have the default
repository format or otherwise they would fail to run, e.g. because they
fail to detect the correct hash function. While the hash function is the
only extension right now that creates problems like this, we are about
to add a second extension for the ref format.

Introduce a new DEFAULT_REPO_FORMAT prereq that can easily be amended
whenever we add new format extensions. Next to making any such changes
easier on us, the prerequisite's name should also help to clarify the
intent better.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t3200-branch.sh | 2 +-
 t/test-lib.sh     | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
index 6a316f081e..de7d3014e4 100755
--- a/t/t3200-branch.sh
+++ b/t/t3200-branch.sh
@@ -519,7 +519,7 @@ EOF
 
 mv .git/config .git/config-saved
 
-test_expect_success SHA1 'git branch -m q q2 without config should succeed' '
+test_expect_success DEFAULT_REPO_FORMAT 'git branch -m q q2 without config should succeed' '
 	git branch -m q q2 &&
 	git branch -m q2 q
 '
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 876b99562a..dc03f06b8e 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1936,6 +1936,10 @@ test_lazy_prereq SHA1 '
 	esac
 '
 
+test_lazy_prereq DEFAULT_REPO_FORMAT '
+	test_have_prereq SHA1
+'
+
 # Ensure that no test accidentally triggers a Git command
 # that runs the actual maintenance scheduler, affecting a user's
 # system permanently.
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v3 02/12] worktree: skip reading HEAD when repairing worktrees
From: Patrick Steinhardt @ 2023-12-29  7:26 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Eric Sunshine
In-Reply-To: <cover.1703833818.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4740 bytes --]

When calling `git init --separate-git-dir=<new-path>` on a preexisting
repository, we move the Git directory of that repository to the new path
specified by the user. If there are worktrees present in the repository,
we need to repair the worktrees so that their gitlinks point to the new
location of the repository.

This repair logic will load repositories via `get_worktrees()`, which
will enumerate up and initialize all worktrees. Part of initialization
is logic that we resolve their respective worktree HEADs, even though
that information may not actually be needed in the end by all callers.

Although not a problem presently with the file-based reference backend,
it will become a problem with the upcoming reftable backend. In the
context of git-init(1) we do not have a fully-initialized repository set
up via `setup_git_directory()` or friends. Consequently, we do not know
about the repository format when `repair_worktrees()` is called, and
properly setting up all parts of the repositroy in `init_db()` before we
try to repair worktrees is not an easy task. With the introduction of
the reftable backend, we would ultimately try to look up the worktree
HEADs before we have figured out the reference format, which does not
work.

We do not require the worktree HEADs at all to repair worktrees. So
let's fix this issue by skipping over the step that reads them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 worktree.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/worktree.c b/worktree.c
index a56a6c2a3d..cc34a3419b 100644
--- a/worktree.c
+++ b/worktree.c
@@ -51,7 +51,7 @@ static void add_head_info(struct worktree *wt)
 /**
  * get the main worktree
  */
-static struct worktree *get_main_worktree(void)
+static struct worktree *get_main_worktree(int skip_reading_head)
 {
 	struct worktree *worktree = NULL;
 	struct strbuf worktree_path = STRBUF_INIT;
@@ -70,11 +70,13 @@ static struct worktree *get_main_worktree(void)
 	 */
 	worktree->is_bare = (is_bare_repository_cfg == 1) ||
 		is_bare_repository();
-	add_head_info(worktree);
+	if (!skip_reading_head)
+		add_head_info(worktree);
 	return worktree;
 }
 
-static struct worktree *get_linked_worktree(const char *id)
+static struct worktree *get_linked_worktree(const char *id,
+					    int skip_reading_head)
 {
 	struct worktree *worktree = NULL;
 	struct strbuf path = STRBUF_INIT;
@@ -93,7 +95,8 @@ static struct worktree *get_linked_worktree(const char *id)
 	CALLOC_ARRAY(worktree, 1);
 	worktree->path = strbuf_detach(&worktree_path, NULL);
 	worktree->id = xstrdup(id);
-	add_head_info(worktree);
+	if (!skip_reading_head)
+		add_head_info(worktree);
 
 done:
 	strbuf_release(&path);
@@ -118,7 +121,14 @@ static void mark_current_worktree(struct worktree **worktrees)
 	free(git_dir);
 }
 
-struct worktree **get_worktrees(void)
+/*
+ * NEEDSWORK: This function exists so that we can look up metadata of a
+ * worktree without trying to access any of its internals like the refdb. It
+ * would be preferable to instead have a corruption-tolerant function for
+ * retrieving worktree metadata that could be used when the worktree is known
+ * to not be in a healthy state, e.g. when creating or repairing it.
+ */
+static struct worktree **get_worktrees_internal(int skip_reading_head)
 {
 	struct worktree **list = NULL;
 	struct strbuf path = STRBUF_INIT;
@@ -128,7 +138,7 @@ struct worktree **get_worktrees(void)
 
 	ALLOC_ARRAY(list, alloc);
 
-	list[counter++] = get_main_worktree();
+	list[counter++] = get_main_worktree(skip_reading_head);
 
 	strbuf_addf(&path, "%s/worktrees", get_git_common_dir());
 	dir = opendir(path.buf);
@@ -137,7 +147,7 @@ struct worktree **get_worktrees(void)
 		while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) {
 			struct worktree *linked = NULL;
 
-			if ((linked = get_linked_worktree(d->d_name))) {
+			if ((linked = get_linked_worktree(d->d_name, skip_reading_head))) {
 				ALLOC_GROW(list, counter + 1, alloc);
 				list[counter++] = linked;
 			}
@@ -151,6 +161,11 @@ struct worktree **get_worktrees(void)
 	return list;
 }
 
+struct worktree **get_worktrees(void)
+{
+	return get_worktrees_internal(0);
+}
+
 const char *get_worktree_git_dir(const struct worktree *wt)
 {
 	if (!wt)
@@ -591,7 +606,7 @@ static void repair_noop(int iserr UNUSED,
 
 void repair_worktrees(worktree_repair_fn fn, void *cb_data)
 {
-	struct worktree **worktrees = get_worktrees();
+	struct worktree **worktrees = get_worktrees_internal(1);
 	struct worktree **wt = worktrees + 1; /* +1 skips main worktree */
 
 	if (!fn)
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v3 03/12] refs: refactor logic to look up storage backends
From: Patrick Steinhardt @ 2023-12-29  7:26 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Eric Sunshine
In-Reply-To: <cover.1703833818.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 5192 bytes --]

In order to look up ref storage backends, we're currently using a linked
list of backends, where each backend is expected to set up its `next`
pointer to the next ref storage backend. This is kind of a weird setup
as backends need to be aware of other backends without much of a reason.

Refactor the code so that the array of backends is centrally defined in
"refs.c", where each backend is now identified by an integer constant.
Expose functions to translate from those integer constants to the name
and vice versa, which will be required by subsequent patches.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c                | 34 +++++++++++++++++++++++++---------
 refs.h                |  3 +++
 refs/debug.c          |  1 -
 refs/files-backend.c  |  1 -
 refs/packed-backend.c |  1 -
 refs/refs-internal.h  |  1 -
 repository.h          |  3 +++
 7 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/refs.c b/refs.c
index 16bfa21df7..dea3d5c9a0 100644
--- a/refs.c
+++ b/refs.c
@@ -33,17 +33,33 @@
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static const struct ref_storage_be *refs_backends[] = {
+	[REF_STORAGE_FORMAT_FILES] = &refs_be_files,
+};
 
-static struct ref_storage_be *find_ref_storage_backend(const char *name)
+static const struct ref_storage_be *find_ref_storage_backend(unsigned int ref_storage_format)
 {
-	struct ref_storage_be *be;
-	for (be = refs_backends; be; be = be->next)
-		if (!strcmp(be->name, name))
-			return be;
+	if (ref_storage_format < ARRAY_SIZE(refs_backends))
+		return refs_backends[ref_storage_format];
 	return NULL;
 }
 
+unsigned int ref_storage_format_by_name(const char *name)
+{
+	for (unsigned int i = 0; i < ARRAY_SIZE(refs_backends); i++)
+		if (refs_backends[i] && !strcmp(refs_backends[i]->name, name))
+			return i;
+	return REF_STORAGE_FORMAT_UNKNOWN;
+}
+
+const char *ref_storage_format_to_name(unsigned int ref_storage_format)
+{
+	const struct ref_storage_be *be = find_ref_storage_backend(ref_storage_format);
+	if (!be)
+		return "unknown";
+	return be->name;
+}
+
 /*
  * How to handle various characters in refnames:
  * 0: An acceptable character for refs
@@ -2029,12 +2045,12 @@ static struct ref_store *ref_store_init(struct repository *repo,
 					const char *gitdir,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	unsigned int format = REF_STORAGE_FORMAT_FILES;
+	const struct ref_storage_be *be = find_ref_storage_backend(format);
 	struct ref_store *refs;
 
 	if (!be)
-		BUG("reference backend %s is unknown", be_name);
+		BUG("reference backend is unknown");
 
 	refs = be->init(repo, gitdir, flags);
 	return refs;
diff --git a/refs.h b/refs.h
index ff113bb12a..11b3b6ccea 100644
--- a/refs.h
+++ b/refs.h
@@ -11,6 +11,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+unsigned int ref_storage_format_by_name(const char *name);
+const char *ref_storage_format_to_name(unsigned int ref_storage_format);
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/debug.c b/refs/debug.c
index 83b7a0ba65..b9775f2c37 100644
--- a/refs/debug.c
+++ b/refs/debug.c
@@ -426,7 +426,6 @@ static int debug_reflog_expire(struct ref_store *ref_store, const char *refname,
 }
 
 struct ref_storage_be refs_be_debug = {
-	.next = NULL,
 	.name = "debug",
 	.init = NULL,
 	.init_db = debug_init_db,
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ad8b1d143f..43fd0ac760 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3241,7 +3241,6 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err UNUSED)
 }
 
 struct ref_storage_be refs_be_files = {
-	.next = NULL,
 	.name = "files",
 	.init = files_ref_store_create,
 	.init_db = files_init_db,
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index b9fa097a29..8d1090e284 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1705,7 +1705,6 @@ static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_s
 }
 
 struct ref_storage_be refs_be_packed = {
-	.next = NULL,
 	.name = "packed",
 	.init = packed_ref_store_create,
 	.init_db = packed_init_db,
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 4af83bf9a5..8e9f04cc67 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -663,7 +663,6 @@ typedef int read_symbolic_ref_fn(struct ref_store *ref_store, const char *refnam
 				 struct strbuf *referent);
 
 struct ref_storage_be {
-	struct ref_storage_be *next;
 	const char *name;
 	ref_store_init_fn *init;
 	ref_init_db_fn *init_db;
diff --git a/repository.h b/repository.h
index 5f18486f64..ea4c488b81 100644
--- a/repository.h
+++ b/repository.h
@@ -24,6 +24,9 @@ enum fetch_negotiation_setting {
 	FETCH_NEGOTIATION_NOOP,
 };
 
+#define REF_STORAGE_FORMAT_UNKNOWN 0
+#define REF_STORAGE_FORMAT_FILES   1
+
 struct repo_settings {
 	int initialized;
 
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v3 04/12] setup: start tracking ref storage format
From: Patrick Steinhardt @ 2023-12-29  7:26 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Eric Sunshine
In-Reply-To: <cover.1703833818.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 10131 bytes --]

In order to discern which ref storage format a repository is supposed to
use we need to start setting up and/or discovering the format. This
needs to happen in two separate code paths.

  - The first path is when we create a repository via `init_db()`. When
    we are re-initializing a preexisting repository we need to retain
    the previously used ref storage format -- if the user asked for a
    different format then this indicates an error and we error out.
    Otherwise we either initialize the repository with the format asked
    for by the user or the default format, which currently is the
    "files" backend.

  - The second path is when discovering repositories, where we need to
    read the config of that repository. There is not yet any way to
    configure something other than the "files" backend, so we can just
    blindly set the ref storage format to this backend.

Wire up this logic so that we have the ref storage format always readily
available when needed. As there is only a single backend and because it
is not configurable we cannot yet verify that this tracking works as
expected via tests, but tests will be added in subsequent commits. To
countermand this ommission now though, raise a BUG() in case the ref
storage format is not set up properly in `ref_store_init()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/clone.c   |  5 +++--
 builtin/init-db.c |  4 +++-
 refs.c            |  4 ++--
 repository.c      |  6 ++++++
 repository.h      |  4 ++++
 setup.c           | 28 +++++++++++++++++++++++++---
 setup.h           |  6 +++++-
 7 files changed, 48 insertions(+), 9 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 343f536cf8..48aeb1b90b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1107,7 +1107,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	 * repository, and reference backends may persist that information into
 	 * their on-disk data structures.
 	 */
-	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, NULL,
+	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
+		REF_STORAGE_FORMAT_UNKNOWN, NULL,
 		do_not_override_repo_unix_permissions, INIT_DB_QUIET | INIT_DB_SKIP_REFDB);
 
 	if (real_git_dir) {
@@ -1292,7 +1293,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
 	initialize_repository_version(hash_algo, 1);
 	repo_set_hash_algo(the_repository, hash_algo);
-	create_reference_database(NULL, 1);
+	create_reference_database(the_repository->ref_storage_format, NULL, 1);
 
 	/*
 	 * Before fetching from the remote, download and install bundle
diff --git a/builtin/init-db.c b/builtin/init-db.c
index cb727c826f..b6e80feab6 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -11,6 +11,7 @@
 #include "object-file.h"
 #include "parse-options.h"
 #include "path.h"
+#include "refs.h"
 #include "setup.h"
 #include "strbuf.h"
 
@@ -236,5 +237,6 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 
 	flags |= INIT_DB_EXIST_OK;
 	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
-		       initial_branch, init_shared_repository, flags);
+		       REF_STORAGE_FORMAT_UNKNOWN, initial_branch,
+		       init_shared_repository, flags);
 }
diff --git a/refs.c b/refs.c
index dea3d5c9a0..fdbf5f4cb1 100644
--- a/refs.c
+++ b/refs.c
@@ -2045,10 +2045,10 @@ static struct ref_store *ref_store_init(struct repository *repo,
 					const char *gitdir,
 					unsigned int flags)
 {
-	unsigned int format = REF_STORAGE_FORMAT_FILES;
-	const struct ref_storage_be *be = find_ref_storage_backend(format);
+	const struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(repo->ref_storage_format);
 	if (!be)
 		BUG("reference backend is unknown");
 
diff --git a/repository.c b/repository.c
index a7679ceeaa..d7d24d416a 100644
--- a/repository.c
+++ b/repository.c
@@ -104,6 +104,11 @@ void repo_set_hash_algo(struct repository *repo, int hash_algo)
 	repo->hash_algo = &hash_algos[hash_algo];
 }
 
+void repo_set_ref_storage_format(struct repository *repo, unsigned int format)
+{
+	repo->ref_storage_format = format;
+}
+
 /*
  * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
  * Return 0 upon success and a non-zero value upon failure.
@@ -184,6 +189,7 @@ int repo_init(struct repository *repo,
 		goto error;
 
 	repo_set_hash_algo(repo, format.hash_algo);
+	repo_set_ref_storage_format(repo, format.ref_storage_format);
 	repo->repository_format_worktree_config = format.worktree_config;
 
 	/* take ownership of format.partial_clone */
diff --git a/repository.h b/repository.h
index ea4c488b81..f5269b3730 100644
--- a/repository.h
+++ b/repository.h
@@ -163,6 +163,9 @@ struct repository {
 	/* Repository's current hash algorithm, as serialized on disk. */
 	const struct git_hash_algo *hash_algo;
 
+	/* Repository's reference storage format, as serialized on disk. */
+	unsigned int ref_storage_format;
+
 	/* A unique-id for tracing purposes. */
 	int trace2_repo_id;
 
@@ -202,6 +205,7 @@ void repo_set_gitdir(struct repository *repo, const char *root,
 		     const struct set_gitdir_args *extra_args);
 void repo_set_worktree(struct repository *repo, const char *path);
 void repo_set_hash_algo(struct repository *repo, int algo);
+void repo_set_ref_storage_format(struct repository *repo, unsigned int format);
 void initialize_the_repository(void);
 RESULT_MUST_BE_USED
 int repo_init(struct repository *r, const char *gitdir, const char *worktree);
diff --git a/setup.c b/setup.c
index bc90bbd033..9c9a167f52 100644
--- a/setup.c
+++ b/setup.c
@@ -1566,6 +1566,8 @@ const char *setup_git_directory_gently(int *nongit_ok)
 		}
 		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			repo_set_ref_storage_format(the_repository,
+						    repo_fmt.ref_storage_format);
 			the_repository->repository_format_worktree_config =
 				repo_fmt.worktree_config;
 			/* take ownership of repo_fmt.partial_clone */
@@ -1659,6 +1661,8 @@ void check_repository_format(struct repository_format *fmt)
 	check_repository_format_gently(get_git_dir(), fmt, NULL);
 	startup_info->have_repository = 1;
 	repo_set_hash_algo(the_repository, fmt->hash_algo);
+	repo_set_ref_storage_format(the_repository,
+				    fmt->ref_storage_format);
 	the_repository->repository_format_worktree_config =
 		fmt->worktree_config;
 	the_repository->repository_format_partial_clone =
@@ -1899,7 +1903,8 @@ static int is_reinit(void)
 	return ret;
 }
 
-void create_reference_database(const char *initial_branch, int quiet)
+void create_reference_database(unsigned int ref_storage_format,
+			       const char *initial_branch, int quiet)
 {
 	struct strbuf err = STRBUF_INIT;
 	int reinit = is_reinit();
@@ -1919,6 +1924,7 @@ void create_reference_database(const char *initial_branch, int quiet)
 	safe_create_dir(git_path("refs"), 1);
 	adjust_shared_perm(git_path("refs"));
 
+	repo_set_ref_storage_format(the_repository, ref_storage_format);
 	if (refs_init_db(&err))
 		die("failed to set up refs db: %s", err.buf);
 
@@ -2137,8 +2143,22 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 	}
 }
 
+static void validate_ref_storage_format(struct repository_format *repo_fmt,
+					unsigned int format)
+{
+	if (repo_fmt->version >= 0 &&
+	    format != REF_STORAGE_FORMAT_UNKNOWN &&
+	    format != repo_fmt->ref_storage_format) {
+		die(_("attempt to reinitialize repository with different reference storage format"));
+	} else if (format != REF_STORAGE_FORMAT_UNKNOWN) {
+		repo_fmt->ref_storage_format = format;
+	}
+}
+
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, int hash, const char *initial_branch,
+	    const char *template_dir, int hash,
+	    unsigned int ref_storage_format,
+	    const char *initial_branch,
 	    int init_shared_repository, unsigned int flags)
 {
 	int reinit;
@@ -2181,13 +2201,15 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	check_repository_format(&repo_fmt);
 
 	validate_hash_algorithm(&repo_fmt, hash);
+	validate_ref_storage_format(&repo_fmt, ref_storage_format);
 
 	reinit = create_default_files(template_dir, original_git_dir,
 				      &repo_fmt, prev_bare_repository,
 				      init_shared_repository);
 
 	if (!(flags & INIT_DB_SKIP_REFDB))
-		create_reference_database(initial_branch, flags & INIT_DB_QUIET);
+		create_reference_database(repo_fmt.ref_storage_format,
+					  initial_branch, flags & INIT_DB_QUIET);
 	create_object_directory();
 
 	if (get_shared_repository()) {
diff --git a/setup.h b/setup.h
index 3f0f17c351..3d3eda7967 100644
--- a/setup.h
+++ b/setup.h
@@ -115,6 +115,7 @@ struct repository_format {
 	int worktree_config;
 	int is_bare;
 	int hash_algo;
+	unsigned int ref_storage_format;
 	int sparse_index;
 	char *work_tree;
 	struct string_list unknown_extensions;
@@ -131,6 +132,7 @@ struct repository_format {
 	.version = -1, \
 	.is_bare = -1, \
 	.hash_algo = GIT_HASH_SHA1, \
+	.ref_storage_format = REF_STORAGE_FORMAT_FILES, \
 	.unknown_extensions = STRING_LIST_INIT_DUP, \
 	.v1_only_extensions = STRING_LIST_INIT_DUP, \
 }
@@ -175,10 +177,12 @@ void check_repository_format(struct repository_format *fmt);
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
+	    unsigned int ref_storage_format,
 	    const char *initial_branch, int init_shared_repository,
 	    unsigned int flags);
 void initialize_repository_version(int hash_algo, int reinit);
-void create_reference_database(const char *initial_branch, int quiet);
+void create_reference_database(unsigned int ref_storage_format,
+			       const char *initial_branch, int quiet);
 
 /*
  * NOTE NOTE NOTE!!
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v3 05/12] setup: set repository's formats on init
From: Patrick Steinhardt @ 2023-12-29  7:26 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Eric Sunshine
In-Reply-To: <cover.1703833818.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2231 bytes --]

The proper hash algorithm and ref storage format that will be used for a
newly initialized repository will be figured out in `init_db()` via
`validate_hash_algorithm()` and `validate_ref_storage_format()`. Until
now though, we never set up the hash algorithm or ref storage format of
`the_repository` accordingly.

There are only two callsites of `init_db()`, one in git-init(1) and one
in git-clone(1). The former function doesn't care for the formats to be
set up properly because it never access the repository after calling the
function in the first place.

For git-clone(1) it's a different story though, as we call `init_db()`
before listing remote refs. While we do indeed have the wrong hash
function in `the_repository` when `init_db()` sets up a non-default
object format for the repository, it never mattered because we adjust
the hash after learning about the remote's hash function via the listed
refs.

So the current state is correct for the hash algo, but it's not for the
ref storage format because git-clone(1) wouldn't know to set it up
properly. But instead of adjusting only the `ref_storage_format`, set
both the hash algo and the ref storage format so that `the_repository`
is in the correct state when `init_db()` exits. This is fine as we will
adjust the hash later on anyway and makes it easier to reason about the
end state of `the_repository`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 setup.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/setup.c b/setup.c
index 9c9a167f52..49570e6b3a 100644
--- a/setup.c
+++ b/setup.c
@@ -2207,6 +2207,13 @@ int init_db(const char *git_dir, const char *real_git_dir,
 				      &repo_fmt, prev_bare_repository,
 				      init_shared_repository);
 
+	/*
+	 * Now that we have set up both the hash algorithm and the ref storage
+	 * format we can update the repository's settings accordingly.
+	 */
+	repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+	repo_set_ref_storage_format(the_repository, repo_fmt.ref_storage_format);
+
 	if (!(flags & INIT_DB_SKIP_REFDB))
 		create_reference_database(repo_fmt.ref_storage_format,
 					  initial_branch, flags & INIT_DB_QUIET);
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v3 06/12] setup: introduce "extensions.refStorage" extension
From: Patrick Steinhardt @ 2023-12-29  7:26 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Eric Sunshine
In-Reply-To: <cover.1703833818.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 9436 bytes --]

Introduce a new "extensions.refStorage" extension that allows us to
specify the ref storage format used by a repository. For now, the only
supported format is the "files" format, but this list will likely soon
be extended to also support the upcoming "reftable" format.

There have been discussions on the Git mailing list in the past around
how exactly this extension should look like. One alternative [1] that
was discussed was whether it would make sense to model the extension in
such a way that backends are arbitrarily stackable. This would allow for
a combined value of e.g. "loose,packed-refs" or "loose,reftable", which
indicates that new refs would be written via "loose" files backend and
compressed into "packed-refs" or "reftable" backends, respectively.

It is arguable though whether this flexibility and the complexity that
it brings with it is really required for now. It is not foreseeable that
there will be a proliferation of backends in the near-term future, and
the current set of existing formats and formats which are on the horizon
can easily be configured with the much simpler proposal where we have a
single value, only.

Furthermore, if we ever see that we indeed want to gain the ability to
arbitrarily stack the ref formats, then we can adapt the current
extension rather easily. Given that Git clients will refuse any unknown
value for the "extensions.refStorage" extension they would also know to
ignore a stacked "loose,packed-refs" in the future.

So let's stick with the easy proposal for the time being and wire up the
extension.

[1]: <pull.1408.git.1667846164.gitgitgadget@gmail.com>

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/config/extensions.txt           | 11 ++++++++
 Documentation/ref-storage-format.txt          |  1 +
 .../technical/repository-version.txt          |  5 ++++
 builtin/clone.c                               |  2 +-
 setup.c                                       | 24 ++++++++++++++---
 setup.h                                       |  4 ++-
 t/t0001-init.sh                               | 26 +++++++++++++++++++
 t/test-lib.sh                                 |  2 +-
 8 files changed, 69 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/ref-storage-format.txt

diff --git a/Documentation/config/extensions.txt b/Documentation/config/extensions.txt
index bccaec7a96..66db0e15da 100644
--- a/Documentation/config/extensions.txt
+++ b/Documentation/config/extensions.txt
@@ -7,6 +7,17 @@ Note that this setting should only be set by linkgit:git-init[1] or
 linkgit:git-clone[1].  Trying to change it after initialization will not
 work and will produce hard-to-diagnose issues.
 
+extensions.refStorage::
+	Specify the ref storage format to use. The acceptable values are:
++
+include::../ref-storage-format.txt[]
++
+It is an error to specify this key unless `core.repositoryFormatVersion` is 1.
++
+Note that this setting should only be set by linkgit:git-init[1] or
+linkgit:git-clone[1]. Trying to change it after initialization will not
+work and will produce hard-to-diagnose issues.
+
 extensions.worktreeConfig::
 	If enabled, then worktrees will load config settings from the
 	`$GIT_DIR/config.worktree` file in addition to the
diff --git a/Documentation/ref-storage-format.txt b/Documentation/ref-storage-format.txt
new file mode 100644
index 0000000000..1a65cac468
--- /dev/null
+++ b/Documentation/ref-storage-format.txt
@@ -0,0 +1 @@
+* `files` for loose files with packed-refs. This is the default.
diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 045a76756f..27be3741e6 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,8 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. The only valid value
+is `files` (loose references with a packed-refs file).
diff --git a/builtin/clone.c b/builtin/clone.c
index 48aeb1b90b..0fb3816d0c 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1291,7 +1291,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	 * ours to the same thing.
 	 */
 	hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
-	initialize_repository_version(hash_algo, 1);
+	initialize_repository_version(hash_algo, the_repository->ref_storage_format, 1);
 	repo_set_hash_algo(the_repository, hash_algo);
 	create_reference_database(the_repository->ref_storage_format, NULL, 1);
 
diff --git a/setup.c b/setup.c
index 49570e6b3a..fb1413cabd 100644
--- a/setup.c
+++ b/setup.c
@@ -592,6 +592,17 @@ static enum extension_result handle_extension(const char *var,
 				     "extensions.objectformat", value);
 		data->hash_algo = format;
 		return EXTENSION_OK;
+	} else if (!strcmp(ext, "refstorage")) {
+		unsigned int format;
+
+		if (!value)
+			return config_error_nonbool(var);
+		format = ref_storage_format_by_name(value);
+		if (format == REF_STORAGE_FORMAT_UNKNOWN)
+			return error(_("invalid value for '%s': '%s'"),
+				     "extensions.refstorage", value);
+		data->ref_storage_format = format;
+		return EXTENSION_OK;
 	}
 	return EXTENSION_UNKNOWN;
 }
@@ -1871,12 +1882,15 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo, int reinit)
+void initialize_repository_version(int hash_algo,
+				   unsigned int ref_storage_format,
+				   int reinit)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 ||
+	    ref_storage_format != REF_STORAGE_FORMAT_FILES)
 		repo_version = GIT_REPO_VERSION_READ;
 
 	/* This forces creation of new config file */
@@ -1889,6 +1903,10 @@ void initialize_repository_version(int hash_algo, int reinit)
 			       hash_algos[hash_algo].name);
 	else if (reinit)
 		git_config_set_gently("extensions.objectformat", NULL);
+
+	if (ref_storage_format != REF_STORAGE_FORMAT_FILES)
+		git_config_set("extensions.refstorage",
+			       ref_storage_format_to_name(ref_storage_format));
 }
 
 static int is_reinit(void)
@@ -2030,7 +2048,7 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
-	initialize_repository_version(fmt->hash_algo, 0);
+	initialize_repository_version(fmt->hash_algo, fmt->ref_storage_format, 0);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
diff --git a/setup.h b/setup.h
index 3d3eda7967..3599aec93c 100644
--- a/setup.h
+++ b/setup.h
@@ -180,7 +180,9 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	    unsigned int ref_storage_format,
 	    const char *initial_branch, int init_shared_repository,
 	    unsigned int flags);
-void initialize_repository_version(int hash_algo, int reinit);
+void initialize_repository_version(int hash_algo,
+				   unsigned int ref_storage_format,
+				   int reinit);
 void create_reference_database(unsigned int ref_storage_format,
 			       const char *initial_branch, int quiet);
 
diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index 2b78e3be47..38b3e4c39e 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -532,6 +532,32 @@ test_expect_success 'init rejects attempts to initialize with different hash' '
 	test_must_fail git -C sha256 init --object-format=sha1
 '
 
+test_expect_success DEFAULT_REPO_FORMAT 'extensions.refStorage is not allowed with repo version 0' '
+	test_when_finished "rm -rf refstorage" &&
+	git init refstorage &&
+	git -C refstorage config extensions.refStorage files &&
+	test_must_fail git -C refstorage rev-parse 2>err &&
+	grep "repo version is 0, but v1-only extension found" err
+'
+
+test_expect_success DEFAULT_REPO_FORMAT 'extensions.refStorage with files backend' '
+	test_when_finished "rm -rf refstorage" &&
+	git init refstorage &&
+	git -C refstorage config core.repositoryformatversion 1 &&
+	git -C refstorage config extensions.refStorage files &&
+	test_commit -C refstorage A &&
+	git -C refstorage rev-parse --verify HEAD
+'
+
+test_expect_success DEFAULT_REPO_FORMAT 'extensions.refStorage with unknown backend' '
+	test_when_finished "rm -rf refstorage" &&
+	git init refstorage &&
+	git -C refstorage config core.repositoryformatversion 1 &&
+	git -C refstorage config extensions.refStorage garbage &&
+	test_must_fail git -C refstorage rev-parse 2>err &&
+	grep "invalid value for ${SQ}extensions.refstorage${SQ}: ${SQ}garbage${SQ}" err
+'
+
 test_expect_success MINGW 'core.hidedotfiles = false' '
 	git config --global core.hidedotfiles false &&
 	rm -rf newdir &&
diff --git a/t/test-lib.sh b/t/test-lib.sh
index dc03f06b8e..4685cc3d48 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1937,7 +1937,7 @@ test_lazy_prereq SHA1 '
 '
 
 test_lazy_prereq DEFAULT_REPO_FORMAT '
-	test_have_prereq SHA1
+	test_have_prereq SHA1,REFFILES
 '
 
 # Ensure that no test accidentally triggers a Git command
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v3 07/12] setup: introduce GIT_DEFAULT_REF_FORMAT envvar
From: Patrick Steinhardt @ 2023-12-29  7:26 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Eric Sunshine
In-Reply-To: <cover.1703833818.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3143 bytes --]

Introduce a new GIT_DEFAULT_REF_FORMAT environment variable that lets
users control the default ref format used by both git-init(1) and
git-clone(1). This is modeled after GIT_DEFAULT_OBJECT_FORMAT, which
does the same thing for the repository's object format.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/git.txt |  5 +++++
 setup.c               |  7 +++++++
 t/t0001-init.sh       | 18 ++++++++++++++++++
 3 files changed, 30 insertions(+)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index bf9e6af695..88e4ed4bd6 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -556,6 +556,11 @@ double-quotes and respecting backslash escapes. E.g., the value
 	is always used. The default is "sha1".
 	See `--object-format` in linkgit:git-init[1].
 
+`GIT_DEFAULT_REF_FORMAT`::
+	If this variable is set, the default reference backend format for new
+	repositories will be set to this value. The default is "files".
+	See `--ref-format` in linkgit:git-init[1].
+
 Git Commits
 ~~~~~~~~~~~
 `GIT_AUTHOR_NAME`::
diff --git a/setup.c b/setup.c
index fb1413cabd..1ab1a66bcb 100644
--- a/setup.c
+++ b/setup.c
@@ -2164,12 +2164,19 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 static void validate_ref_storage_format(struct repository_format *repo_fmt,
 					unsigned int format)
 {
+	const char *name = getenv("GIT_DEFAULT_REF_FORMAT");
+
 	if (repo_fmt->version >= 0 &&
 	    format != REF_STORAGE_FORMAT_UNKNOWN &&
 	    format != repo_fmt->ref_storage_format) {
 		die(_("attempt to reinitialize repository with different reference storage format"));
 	} else if (format != REF_STORAGE_FORMAT_UNKNOWN) {
 		repo_fmt->ref_storage_format = format;
+	} else if (name) {
+		format = ref_storage_format_by_name(name);
+		if (format == REF_STORAGE_FORMAT_UNKNOWN)
+			die(_("unknown ref storage format '%s'"), name);
+		repo_fmt->ref_storage_format = format;
 	}
 }
 
diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index 38b3e4c39e..30ce752cc1 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -558,6 +558,24 @@ test_expect_success DEFAULT_REPO_FORMAT 'extensions.refStorage with unknown back
 	grep "invalid value for ${SQ}extensions.refstorage${SQ}: ${SQ}garbage${SQ}" err
 '
 
+test_expect_success DEFAULT_REPO_FORMAT 'init with GIT_DEFAULT_REF_FORMAT=files' '
+	test_when_finished "rm -rf refformat" &&
+	GIT_DEFAULT_REF_FORMAT=files git init refformat &&
+	echo 0 >expect &&
+	git -C refformat config core.repositoryformatversion >actual &&
+	test_cmp expect actual &&
+	test_must_fail git -C refformat config extensions.refstorage
+'
+
+test_expect_success 'init with GIT_DEFAULT_REF_FORMAT=garbage' '
+	test_when_finished "rm -rf refformat" &&
+	cat >expect <<-EOF &&
+	fatal: unknown ref storage format ${SQ}garbage${SQ}
+	EOF
+	test_must_fail env GIT_DEFAULT_REF_FORMAT=garbage git init refformat 2>err &&
+	test_cmp expect err
+'
+
 test_expect_success MINGW 'core.hidedotfiles = false' '
 	git config --global core.hidedotfiles false &&
 	rm -rf newdir &&
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v3 08/12] t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar
From: Patrick Steinhardt @ 2023-12-29  7:26 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Eric Sunshine
In-Reply-To: <cover.1703833818.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2531 bytes --]

Introduce a new GIT_TEST_DEFAULT_REF_FORMAT environment variable that
lets developers run the test suite with a different default ref format
without impacting the ref format used by non-test Git invocations. This
is modeled after GIT_TEST_DEFAULT_OBJECT_FORMAT, which does the same
thing for the repository's object format.

Adapt the setup of the `REFFILES` test prerequisite to be conditionally
set based on the default ref format.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/README                |  3 +++
 t/test-lib-functions.sh |  5 +++++
 t/test-lib.sh           | 11 ++++++++++-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/t/README b/t/README
index 36463d0742..621d3b8c09 100644
--- a/t/README
+++ b/t/README
@@ -479,6 +479,9 @@ GIT_TEST_DEFAULT_HASH=<hash-algo> specifies which hash algorithm to
 use in the test scripts. Recognized values for <hash-algo> are "sha1"
 and "sha256".
 
+GIT_TEST_DEFAULT_REF_FORMAT=<format> specifies which ref storage format
+to use in the test scripts. Recognized values for <format> are "files".
+
 GIT_TEST_NO_WRITE_REV_INDEX=<boolean>, when true disables the
 'pack.writeReverseIndex' setting.
 
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 5eb57914ab..a3a51ea9e8 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1659,6 +1659,11 @@ test_detect_hash () {
 	test_hash_algo="${GIT_TEST_DEFAULT_HASH:-sha1}"
 }
 
+# Detect the hash algorithm in use.
+test_detect_ref_format () {
+	echo "${GIT_TEST_DEFAULT_REF_FORMAT:-files}"
+}
+
 # Load common hash metadata and common placeholder object IDs for use with
 # test_oid.
 test_oid_init () {
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 4685cc3d48..fc93aa57e6 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -542,6 +542,8 @@ export EDITOR
 
 GIT_DEFAULT_HASH="${GIT_TEST_DEFAULT_HASH:-sha1}"
 export GIT_DEFAULT_HASH
+GIT_DEFAULT_REF_FORMAT="${GIT_TEST_DEFAULT_REF_FORMAT:-files}"
+export GIT_DEFAULT_REF_FORMAT
 GIT_TEST_MERGE_ALGORITHM="${GIT_TEST_MERGE_ALGORITHM:-ort}"
 export GIT_TEST_MERGE_ALGORITHM
 
@@ -1745,7 +1747,14 @@ parisc* | hppa*)
 	;;
 esac
 
-test_set_prereq REFFILES
+case "$GIT_DEFAULT_REF_FORMAT" in
+files)
+	test_set_prereq REFFILES;;
+*)
+	echo 2>&1 "error: unknown ref format $GIT_DEFAULT_REF_FORMAT"
+	exit 1
+	;;
+esac
 
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_CURL" && test_set_prereq LIBCURL
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v3 09/12] builtin/rev-parse: introduce `--show-ref-format` flag
From: Patrick Steinhardt @ 2023-12-29  7:27 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Eric Sunshine
In-Reply-To: <cover.1703833818.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2428 bytes --]

Introduce a new `--show-ref-format` to git-rev-parse(1) that causes it
to print the ref format used by a repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/git-rev-parse.txt |  3 +++
 builtin/rev-parse.c             |  4 ++++
 t/t1500-rev-parse.sh            | 17 +++++++++++++++++
 3 files changed, 24 insertions(+)

diff --git a/Documentation/git-rev-parse.txt b/Documentation/git-rev-parse.txt
index 912fab9f5e..546faf9017 100644
--- a/Documentation/git-rev-parse.txt
+++ b/Documentation/git-rev-parse.txt
@@ -307,6 +307,9 @@ The following options are unaffected by `--path-format`:
 	input, multiple algorithms may be printed, space-separated.
 	If not specified, the default is "storage".
 
+--show-ref-format::
+	Show the reference storage format used for the repository.
+
 
 Other Options
 ~~~~~~~~~~~~~
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 917f122440..d08987646a 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -1062,6 +1062,10 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 				puts(the_hash_algo->name);
 				continue;
 			}
+			if (!strcmp(arg, "--show-ref-format")) {
+				puts(ref_storage_format_to_name(the_repository->ref_storage_format));
+				continue;
+			}
 			if (!strcmp(arg, "--end-of-options")) {
 				seen_end_of_options = 1;
 				if (filter & (DO_FLAGS | DO_REVS))
diff --git a/t/t1500-rev-parse.sh b/t/t1500-rev-parse.sh
index 3f9e7f62e4..a669e592f1 100755
--- a/t/t1500-rev-parse.sh
+++ b/t/t1500-rev-parse.sh
@@ -208,6 +208,23 @@ test_expect_success 'rev-parse --show-object-format in repo' '
 	grep "unknown mode for --show-object-format: squeamish-ossifrage" err
 '
 
+test_expect_success 'rev-parse --show-ref-format' '
+	test_detect_ref_format >expect &&
+	git rev-parse --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'rev-parse --show-ref-format with invalid storage' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git config extensions.refstorage broken &&
+		test_must_fail git rev-parse --show-ref-format 2>err &&
+		grep "error: invalid value for ${SQ}extensions.refstorage${SQ}: ${SQ}broken${SQ}" err
+	)
+'
+
 test_expect_success '--show-toplevel from subdir of working tree' '
 	pwd >expect &&
 	git -C sub/dir rev-parse --show-toplevel >actual &&
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v3 10/12] builtin/init: introduce `--ref-format=` value flag
From: Patrick Steinhardt @ 2023-12-29  7:27 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Eric Sunshine
In-Reply-To: <cover.1703833818.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4861 bytes --]

Introduce a new `--ref-format` value flag for git-init(1) that allows
the user to specify the ref format that is to be used for a newly
initialized repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/git-init.txt |  7 +++++++
 builtin/init-db.c          | 13 ++++++++++++-
 t/t0001-init.sh            | 26 ++++++++++++++++++++++++++
 3 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-init.txt b/Documentation/git-init.txt
index 6f0d2973bf..e8dc645bb5 100644
--- a/Documentation/git-init.txt
+++ b/Documentation/git-init.txt
@@ -11,6 +11,7 @@ SYNOPSIS
 [verse]
 'git init' [-q | --quiet] [--bare] [--template=<template-directory>]
 	  [--separate-git-dir <git-dir>] [--object-format=<format>]
+	  [--ref-format=<format>]
 	  [-b <branch-name> | --initial-branch=<branch-name>]
 	  [--shared[=<permissions>]] [<directory>]
 
@@ -57,6 +58,12 @@ values are 'sha1' and (if enabled) 'sha256'.  'sha1' is the default.
 +
 include::object-format-disclaimer.txt[]
 
+--ref-format=<format>::
+
+Specify the given ref storage format for the repository. The valid values are:
++
+include::ref-storage-format.txt[]
+
 --template=<template-directory>::
 
 Specify the directory from which templates will be used.  (See the "TEMPLATE
diff --git a/builtin/init-db.c b/builtin/init-db.c
index b6e80feab6..a4f81e2af5 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -58,6 +58,7 @@ static int shared_callback(const struct option *opt, const char *arg, int unset)
 static const char *const init_db_usage[] = {
 	N_("git init [-q | --quiet] [--bare] [--template=<template-directory>]\n"
 	   "         [--separate-git-dir <git-dir>] [--object-format=<format>]\n"
+	   "         [--ref-format=<format>]\n"
 	   "         [-b <branch-name> | --initial-branch=<branch-name>]\n"
 	   "         [--shared[=<permissions>]] [<directory>]"),
 	NULL
@@ -77,8 +78,10 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *template_dir = NULL;
 	unsigned int flags = 0;
 	const char *object_format = NULL;
+	const char *ref_format = NULL;
 	const char *initial_branch = NULL;
 	int hash_algo = GIT_HASH_UNKNOWN;
+	unsigned int ref_storage_format = REF_STORAGE_FORMAT_UNKNOWN;
 	int init_shared_repository = -1;
 	const struct option init_db_options[] = {
 		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
@@ -96,6 +99,8 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 			   N_("override the name of the initial branch")),
 		OPT_STRING(0, "object-format", &object_format, N_("hash"),
 			   N_("specify the hash algorithm to use")),
+		OPT_STRING(0, "ref-format", &ref_format, N_("format"),
+			   N_("specify the reference format to use")),
 		OPT_END()
 	};
 
@@ -159,6 +164,12 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 			die(_("unknown hash algorithm '%s'"), object_format);
 	}
 
+	if (ref_format) {
+		ref_storage_format = ref_storage_format_by_name(ref_format);
+		if (ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN)
+			die(_("unknown ref storage format '%s'"), ref_format);
+	}
+
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
 
@@ -237,6 +248,6 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 
 	flags |= INIT_DB_EXIST_OK;
 	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
-		       REF_STORAGE_FORMAT_UNKNOWN, initial_branch,
+		       ref_storage_format, initial_branch,
 		       init_shared_repository, flags);
 }
diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index 30ce752cc1..b131d665db 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -576,6 +576,32 @@ test_expect_success 'init with GIT_DEFAULT_REF_FORMAT=garbage' '
 	test_cmp expect err
 '
 
+test_expect_success 'init with --ref-format=files' '
+	test_when_finished "rm -rf refformat" &&
+	git init --ref-format=files refformat &&
+	echo files >expect &&
+	git -C refformat rev-parse --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 're-init with same format' '
+	test_when_finished "rm -rf refformat" &&
+	git init --ref-format=files refformat &&
+	git init --ref-format=files refformat &&
+	echo files >expect &&
+	git -C refformat rev-parse --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'init with --ref-format=garbage' '
+	test_when_finished "rm -rf refformat" &&
+	cat >expect <<-EOF &&
+	fatal: unknown ref storage format ${SQ}garbage${SQ}
+	EOF
+	test_must_fail git init --ref-format=garbage refformat 2>err &&
+	test_cmp expect err
+'
+
 test_expect_success MINGW 'core.hidedotfiles = false' '
 	git config --global core.hidedotfiles false &&
 	rm -rf newdir &&
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v3 11/12] builtin/clone: introduce `--ref-format=` value flag
From: Patrick Steinhardt @ 2023-12-29  7:27 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Eric Sunshine
In-Reply-To: <cover.1703833818.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4004 bytes --]

Introduce a new `--ref-format` value flag for git-clone(1) that allows
the user to specify the ref format that is to be used for a newly
initialized repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/git-clone.txt |  6 ++++++
 builtin/clone.c             | 12 +++++++++++-
 t/t5601-clone.sh            | 17 +++++++++++++++++
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index c37c4a37f7..6e43eb9c20 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -311,6 +311,12 @@ or `--mirror` is given)
 	The result is Git repository can be separated from working
 	tree.
 
+--ref-format=<ref-format::
+
+Specify the given ref storage format for the repository. The valid values are:
++
+include::ref-storage-format.txt[]
+
 -j <n>::
 --jobs <n>::
 	The number of submodules fetched at the same time.
diff --git a/builtin/clone.c b/builtin/clone.c
index 0fb3816d0c..f1635e0e8c 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -72,6 +72,7 @@ static char *remote_name = NULL;
 static char *option_branch = NULL;
 static struct string_list option_not = STRING_LIST_INIT_NODUP;
 static const char *real_git_dir;
+static const char *ref_format;
 static char *option_upload_pack = "git-upload-pack";
 static int option_verbosity;
 static int option_progress = -1;
@@ -157,6 +158,8 @@ static struct option builtin_clone_options[] = {
 		    N_("any cloned submodules will be shallow")),
 	OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 		   N_("separate git dir from working tree")),
+	OPT_STRING(0, "ref-format", &ref_format, N_("format"),
+		   N_("specify the reference format to use")),
 	OPT_STRING_LIST('c', "config", &option_config, N_("key=value"),
 			N_("set config inside the new repository")),
 	OPT_STRING_LIST(0, "server-option", &server_options,
@@ -932,6 +935,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	int submodule_progress;
 	int filter_submodules = 0;
 	int hash_algo;
+	unsigned int ref_storage_format = REF_STORAGE_FORMAT_UNKNOWN;
 	const int do_not_override_repo_unix_permissions = -1;
 
 	struct transport_ls_refs_options transport_ls_refs_options =
@@ -957,6 +961,12 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	if (option_single_branch == -1)
 		option_single_branch = deepen ? 1 : 0;
 
+	if (ref_format) {
+		ref_storage_format = ref_storage_format_by_name(ref_format);
+		if (ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN)
+			die(_("unknown ref storage format '%s'"), ref_format);
+	}
+
 	if (option_mirror)
 		option_bare = 1;
 
@@ -1108,7 +1118,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	 * their on-disk data structures.
 	 */
 	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
-		REF_STORAGE_FORMAT_UNKNOWN, NULL,
+		ref_storage_format, NULL,
 		do_not_override_repo_unix_permissions, INIT_DB_QUIET | INIT_DB_SKIP_REFDB);
 
 	if (real_git_dir) {
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 47eae641f0..fb1b9c686d 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -157,6 +157,23 @@ test_expect_success 'clone --mirror does not repeat tags' '
 
 '
 
+test_expect_success 'clone with files ref format' '
+	test_when_finished "rm -rf ref-storage" &&
+	git clone --ref-format=files --mirror src ref-storage &&
+	echo files >expect &&
+	git -C ref-storage rev-parse --show-ref-format >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone with garbage ref format' '
+	cat >expect <<-EOF &&
+	fatal: unknown ref storage format ${SQ}garbage${SQ}
+	EOF
+	test_must_fail git clone --ref-format=garbage --mirror src ref-storage 2>err &&
+	test_cmp expect err &&
+	test_path_is_missing ref-storage
+'
+
 test_expect_success 'clone to destination with trailing /' '
 
 	git clone src target-1/ &&
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox