Git development
 help / color / mirror / Atom feed
* Re: [PATCH] mem-pool: fix big allocations
From: René Scharfe @ 2023-12-28 18:56 UTC (permalink / raw)
  To: phillip.wood, Git List; +Cc: Jameson Miller
In-Reply-To: <e1e43a6c-3e06-4453-88a3-f00476132bcd@gmail.com>

Am 28.12.23 um 17:48 schrieb phillip.wood123@gmail.com:
> On 28/12/2023 16:05, René Scharfe wrote:
>> Am 28.12.23 um 16:10 schrieb Phillip Wood:
>>> The diff at the end of
>>> this email shows a possible implementation of a check_ptr() macro for
>>> the unit test library. I'm wary of adding it though because I'm not sure
>>> printing the pointer values is actually very useful most of the
>>> time. I'm also concerned that the rules around pointer arithmetic and
>>> comparisons mean that many pointer tests such as
>>>
>>>      check_ptr(pool->mp_block->next_free, <=, pool->mp_block->end);
>>>
>>> will be undefined if they fail.
>>
>> True, the compiler could legally emit mush when it finds out that the
>> pointers are for different objects.  And the error being fixed produces
>> such unrelated pointer pairs -- oops.
>>
>> This check is not important here, we can just drop it.
>>
>> mem_pool_contains() has the same problem, by the way.
>>
>> Restricting ourselves to only equality comparisons for pointers prevents
>> some interesting sanity checks, though.  Casting to intptr_t or
>> uintptr_t would allow arbitrary comparisons without risk of undefined
>> behavior, though.  Perhaps that would make a check_ptr() macro viable
>> and useful.
>
> That certainly helps and the check_ptr() macro in my previous email
> casts the pointers to uintptr_t before comparing them. Maybe I'm
> worrying too much, but my concern is that in a failing comparison it
> is likely one of the pointers is invalid (for example it is the
> result of some undefined pointer arithmetic) and the program is
> undefined from the point the invalid pointer is created.

There are no restrictions on integer comparisons.  So comparing after
casting to uintptr_t should not invoke undefined behavior.  If undefined
behavior was involved in calculating the pointers in the first place
then the compiler might still legally go crazy, but not due to the
comparison.  Right?

Whether the result of a uintptr_t-cast comparison of pointers to
different objects is meaningful is a different question.  Hopefully
range checks are possible.

> The
> documentation for check_ptr() in my previous mail contains the
> following example
>
>     For example if `start` and `end` are pointers to the beginning and
>     end of an allocation and `offset` is an integer then
>
>         check_ptr(start + offset, <=, end)
>
>     is undefined when `offset` is larger than `end - start`. Rewriting
>     the comparison as
>
>         check_uint(offset, <=, end - start)
>
>     avoids undefined behavior when offset is too large, but is still
>     undefined if there is a bug that means `start` and `end` do not
>     point to the same allocation.

True, but in such a unit test we'd need additional checks verifying
that start and end belong to the same object.  Or perhaps use a
numerical size instead of an end pointer.

René

^ permalink raw reply

* Re: [PATCH v2 02/12] worktree: skip reading HEAD when repairing worktrees
From: Eric Sunshine @ 2023-12-28 18:13 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano
In-Reply-To: <CAPig+cT6mRyJijL1qo2g56Yny-JxkDYjjmGpAncyS_4Hcpaz6Q@mail.gmail.com>

On Thu, Dec 28, 2023 at 1:08 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> Having said all that, I'm not overly opposed to this patch, especially
> since your main focus is on getting the reftable backend integrated,
> and because the changes (and ugliness) introduced by this patch are
> entirely self-contained and private to worktree.c, so are not a
> show-stopper by any means. Rather, I wanted to get down to writing
> what I think would be a better future approach if someone gets around
> to tackling it. (There is no pressing need at the moment, and that
> someone doesn't have to be you.)

I forgot to mention that, if you reroll for some reason, the
get_worktrees()/get_worktrees_internal() dance might deserve an
in-source NEEDSWORK comment explaining that get_worktrees_internal()
exists to work around the shortcoming that a corruption-tolerant
function for retrieving worktree metadata (for use by the "repair"
function) does not yet exist.

^ permalink raw reply

* Re: [PATCH 0/6] worktree: initialize refdb via ref backends
From: Junio C Hamano @ 2023-12-28 18:11 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <cover.1703754513.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> when initializing worktrees we manually create the on-disk data
> structures required for the ref backend in "worktree.c". This works just
> fine right now where we only have a single user-exposed ref backend, but
> it will become unwieldy once we have multiple ref backends. This patch
> series thus refactors how we initialize worktrees so that we can use
> `refs_init_db()` to initialize required files for us.
>
> This patch series conflicts with ps/refstorage-extension. The conflict
> can be solved as shown below. I'm happy to defer this patch series
> though until the topic has landed on `master` in case this causes
> issues.

Resolution is not all that bad, but the change in function signature
means comments/explanations near both the caller and the callee of
the get_linked_worktree() function may need updating, I would think.
For example, ...

> diff --git a/worktree.h b/worktree.h
> index 8a75691eac..f14784a2ff 100644
> --- a/worktree.h
> +++ b/worktree.h
> @@ -61,7 +61,8 @@ struct worktree *find_worktree(struct worktree **list,
>   * Look up the worktree corresponding to `id`, or NULL of no such worktree
>   * exists.
>   */
> -struct worktree *get_linked_worktree(const char *id);
> +struct worktree *get_linked_worktree(const char *id,
> +				     int skip_reading_head);

... this now needs to help developers who may want to add new
callers what to pass in "skip_reading_head" and why.

We may indeed want to build this on top of the refstorage-extansion
thing, as it seems to be relatively close to completion.

Thanks (and a happy new year).

^ permalink raw reply

* Re: [PATCH v2 02/12] worktree: skip reading HEAD when repairing worktrees
From: Eric Sunshine @ 2023-12-28 18:08 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak, Junio C Hamano
In-Reply-To: <ecf4f1ddee36643f0ff7e3d40b9aa7c7e6e6ce43.1703753910.git.ps@pks.im>

On Thu, Dec 28, 2023 at 4:57 AM Patrick Steinhardt <ps@pks.im> wrote:
> When calling `git init --separate-git-dir=<new-path>` on a preexisting
> repository, we move the Git directory of that repository to the new path
> specified by the user. If there are worktrees present in the repository,
> we need to repair the worktrees so that their gitlinks point to the new
> location of the repository.
>
> This repair logic will load repositories via `get_worktrees()`, which
> will enumerate up and initialize all worktrees. Part of initialization
> is logic that we resolve their respective worktree HEADs, even though
> that information may not actually be needed in the end by all callers.
>
> In the context of git-init(1) this is about to become a problem, because
> we do not have a repository that was set up via `setup_git_directory()`
> or friends. Consequentially, it is not yet fully initialized at the time
> of calling `repair_worktrees()`, and properly setting up all parts of
> the repository in `init_db()` before we repair worktrees is not an easy
> thing to do. While this is okay right now where we only have a single
> reference backend in Git, once we gain a second one we would be trying
> to look up the worktree HEADs before we have figured out the reference
> format, which does not work.

s/Consequentially/Consequently/

I found it difficult to digest this paragraph with its foreshadowing
phrase "about to become a problem" since it wasn't apparent until the
very final sentence in the paragraph what the actual problem would be.
Perhaps if you mention early on that the reftable backend will have
trouble with the current code, it would be easier to grasp. Maybe
something like this:

    Although not a problem presently with the file-based reference
    backend, it will become a problem with the upcoming reftable
    backend.  In the context of git-init(1) a fully-materialized
    repository set up via `setup_git_directory()` or friends is not
    yet present.  Consequently, it is not yet fully initialized at the
    time `repair_worktrees()` is called, and properly setting up all
    parts of the repository in `init_db()` before we repair worktrees
    is not an easy task.  With introduction of the reftable backend,
    it would try to look up the worktree HEADs before we have figured
    out the reference format, thus would not work.

> We do not require the worktree HEADs at all to repair worktrees. So
> let's fix this issue by skipping over the step that reads them.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> diff --git a/worktree.c b/worktree.c
> @@ -51,7 +51,7 @@ static void add_head_info(struct worktree *wt)
> -static struct worktree *get_main_worktree(void)
> +static struct worktree *get_main_worktree(int skip_reading_head)
>  {
> -       add_head_info(worktree);
> +       if (!skip_reading_head)
> +               add_head_info(worktree);

This is so special-case that it feels more than a little dirty.

> @@ -591,7 +599,7 @@ static void repair_noop(int iserr UNUSED,
>  void repair_worktrees(worktree_repair_fn fn, void *cb_data)
>  {
> -       struct worktree **worktrees = get_worktrees();
> +       struct worktree **worktrees = get_worktrees_internal(1);

In an ideal world, a repair function should not be calling
get_worktrees() at all since get_worktrees() is not tolerant of
corruption of the worktree administrative files. (Plus, as you note,
it does more work than necessary for the current set of repairs
performed by `git worktree repair`.)

Even as I was implementing the worktree repair code, I wavered back
and forth multiple times between calling get_worktrees() and writing a
custom corruption-tolerant function to retrieve worktree
administrative information. In the end, I opted for get_worktrees()
for the pragmatic reason that it allowed me to narrow the scope of the
patches to the types of repairs which were the current focus without
getting mired down in the involved details of writing a
corruption-tolerant function for retrieving worktree metadata.
However, that decision was made with the understanding that the
pragmatic choice of the moment would not rule out the possibility of
returning later and implementing the more correct approach of having a
corruption-tolerant function for retrieving worktree metadata.

The special-case ugliness of this patch suggests strongly in favor of
implementing the earlier-envisioned corruption-tolerant function for
retrieving worktree metadata rather than the band-aid approach taken
by this patch. The generic name get_worktrees_internal() isn't helpful
either; it doesn't do a good job of conveying any particular meaning
to the reader.

Having said all that, I'm not overly opposed to this patch, especially
since your main focus is on getting the reftable backend integrated,
and because the changes (and ugliness) introduced by this patch are
entirely self-contained and private to worktree.c, so are not a
show-stopper by any means. Rather, I wanted to get down to writing
what I think would be a better future approach if someone gets around
to tackling it. (There is no pressing need at the moment, and that
someone doesn't have to be you.)

^ permalink raw reply

* Re: [PATCH v2 03/12] refs: refactor logic to look up storage backends
From: Junio C Hamano @ 2023-12-28 17:25 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Karthik Nayak
In-Reply-To: <12329b99b753f79fe93fe017e71b08227d213c1e.1703753910.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> In order to look up ref storage backends, we're currently using a linked
> list of backends, where each backend is expected to set up its `next`
> pointer to the next ref storage backend. This is kind of a weird setup
> as backends need to be aware of other backends without much of a reason.
>
> Refactor the code so that the array of backends is centrally defined in
> "refs.c", where each backend is now identified by an integer constant.
> Expose functions to translate from those integer constants to the name
> and vice versa, which will be required by subsequent patches.

A small question.  Does this have to be "int", or is "unsigned" (or
even an enum, rewrittenfrom the "REF_STORAGE_FORMAT_*" family of CPP
macro constants) good enough?  I am only wondering what happens when
you clal find_ref_storage_backend() with a negative index.

For that matter, how REF_STORAGE_FORMAT_UNKNOWN (whose value is 0)
is handled by the function also gets curious.  The caller may have
to find that the backend hasn't been specified by receiving an
element in the refs_backends[] array that corresponds to it, but the
error behaviour of this function is also to return NULL, so it has
to be prepared to handle both cases?

> +static const struct ref_storage_be *refs_backends[] = {
> +	[REF_STORAGE_FORMAT_FILES] = &refs_be_files,
> +};
> ...
> +static const struct ref_storage_be *find_ref_storage_backend(int ref_storage_format)
>  {
> +	if (ref_storage_format < ARRAY_SIZE(refs_backends))
> +		return refs_backends[ref_storage_format];
>  	return NULL;
>  }

^ permalink raw reply

* Re: [PATCH 04/12] setup: start tracking ref storage format when
From: Junio C Hamano @ 2023-12-28 17:15 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <ZY04OlTNUEZs5T-T@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

> Makes me wonder whether we should then also add the following diff to
> "setup: set repository's format on init" when both topics are being
> merged together:
>
> diff --git a/setup.c b/setup.c
> index 3d980814bc..3d35c78c68 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -2210,6 +2210,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
>  	 * format we can update the repository's settings accordingly.
>  	 */
>  	repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
> +	repo_set_compat_hash_algo(the_repository, repo_fmt.compat_hash_algo);
>  	repo_set_ref_storage_format(the_repository, repo_fmt.ref_storage_format);
>  
>  	if (!(flags & INIT_DB_SKIP_REFDB))

Shouldn't that come from the series that wants .compat_hash_algo in
the repo_fmt structure, whichever it is, not added by an evil merge?

^ permalink raw reply

* Re: [PATCH v2 8/8] reftable/merged: transfer ownership of records when iterating
From: Junio C Hamano @ 2023-12-28 17:04 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <25a3919e583a9d13403b8add92260529e19e08fb.1703743174.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> When iterating over recods with the merged iterator we put the records

"records"?

I commented on a few patches in this iteration that I found a bit
harder to follow than necessary; aside from these small nits,
everything in the series looked quite well explained and executed.

Thanks.


^ permalink raw reply

* Re: [PATCH v2 5/8] reftable/record: store "val1" hashes as static arrays
From: Junio C Hamano @ 2023-12-28 17:03 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <46ca3a37f805cd36faa26927220c2793d4cdd561.1703743174.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> When reading ref records of type "val1" we store its object ID in an

I'd find it easier to follow if we had a comma before "we store",
but perhaps I am old fashioned.

> allocated array. This results in an additional allocation for every
> single ref record we read, which is rather inefficient especially when
> iterating over refs.
>
> Refactor the code to instead use a static array of `GIT_MAX_RAWSZ`

"a static" -> "an embedded", perhaps?  The struct as the whole may
or may not be static but the point of this patch is that the array
is embedded in it.

> bytes. While this means that `struct ref_record` is bigger now, we
> typically do not store all refs in an array anyway and instead only
> handle a limited number of records at the same point in time.

Nicely explained.

> Using `git show-ref --quiet` in a repository with ~350k refs this leads
> to a significant drop in allocations. Before:
>
>     HEAP SUMMARY:
>         in use at exit: 21,098 bytes in 192 blocks
>       total heap usage: 2,116,683 allocs, 2,116,491 frees, 76,098,060 bytes allocated
>
> After:
>
>     HEAP SUMMARY:
>         in use at exit: 21,098 bytes in 192 blocks
>       total heap usage: 1,419,031 allocs, 1,418,839 frees, 62,145,036 bytes allocated
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  reftable/block_test.c      |  4 +---
>  reftable/merged_test.c     | 16 ++++++----------
>  reftable/readwrite_test.c  | 14 ++++----------
>  reftable/record.c          |  3 ---
>  reftable/record_test.c     |  1 -
>  reftable/reftable-record.h |  3 ++-
>  reftable/stack_test.c      |  2 --
>  7 files changed, 13 insertions(+), 30 deletions(-)
>
> diff --git a/reftable/block_test.c b/reftable/block_test.c
> index c00bbc8aed..dedb05c7d8 100644
> --- a/reftable/block_test.c
> +++ b/reftable/block_test.c
> @@ -49,13 +49,11 @@ static void test_block_read_write(void)
>  
>  	for (i = 0; i < N; i++) {
>  		char name[100];
> -		uint8_t hash[GIT_SHA1_RAWSZ];
>  		snprintf(name, sizeof(name), "branch%02d", i);
> -		memset(hash, i, sizeof(hash));
>  
>  		rec.u.ref.refname = name;
>  		rec.u.ref.value_type = REFTABLE_REF_VAL1;
> -		rec.u.ref.value.val1 = hash;
> +		memset(rec.u.ref.value.val1, i, GIT_SHA1_RAWSZ);
>  
>  		names[i] = xstrdup(name);
>  		n = block_writer_add(&bw, &rec);
> diff --git a/reftable/merged_test.c b/reftable/merged_test.c
> index d08c16abef..b3927a5d73 100644
> --- a/reftable/merged_test.c
> +++ b/reftable/merged_test.c
> @@ -123,13 +123,11 @@ static void readers_destroy(struct reftable_reader **readers, size_t n)
>  
>  static void test_merged_between(void)
>  {
> -	uint8_t hash1[GIT_SHA1_RAWSZ] = { 1, 2, 3, 0 };
> -
>  	struct reftable_ref_record r1[] = { {
>  		.refname = "b",
>  		.update_index = 1,
>  		.value_type = REFTABLE_REF_VAL1,
> -		.value.val1 = hash1,
> +		.value.val1 = { 1, 2, 3, 0 },
>  	} };
>  	struct reftable_ref_record r2[] = { {
>  		.refname = "a",
> @@ -165,26 +163,24 @@ static void test_merged_between(void)
>  
>  static void test_merged(void)
>  {
> -	uint8_t hash1[GIT_SHA1_RAWSZ] = { 1 };
> -	uint8_t hash2[GIT_SHA1_RAWSZ] = { 2 };
>  	struct reftable_ref_record r1[] = {
>  		{
>  			.refname = "a",
>  			.update_index = 1,
>  			.value_type = REFTABLE_REF_VAL1,
> -			.value.val1 = hash1,
> +			.value.val1 = { 1 },
>  		},
>  		{
>  			.refname = "b",
>  			.update_index = 1,
>  			.value_type = REFTABLE_REF_VAL1,
> -			.value.val1 = hash1,
> +			.value.val1 = { 1 },
>  		},
>  		{
>  			.refname = "c",
>  			.update_index = 1,
>  			.value_type = REFTABLE_REF_VAL1,
> -			.value.val1 = hash1,
> +			.value.val1 = { 1 },
>  		}
>  	};
>  	struct reftable_ref_record r2[] = { {
> @@ -197,13 +193,13 @@ static void test_merged(void)
>  			.refname = "c",
>  			.update_index = 3,
>  			.value_type = REFTABLE_REF_VAL1,
> -			.value.val1 = hash2,
> +			.value.val1 = { 2 },
>  		},
>  		{
>  			.refname = "d",
>  			.update_index = 3,
>  			.value_type = REFTABLE_REF_VAL1,
> -			.value.val1 = hash1,
> +			.value.val1 = { 1 },
>  		},
>  	};
>  
> diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
> index 9c16e0504e..87b238105c 100644
> --- a/reftable/readwrite_test.c
> +++ b/reftable/readwrite_test.c
> @@ -60,18 +60,15 @@ static void write_table(char ***names, struct strbuf *buf, int N,
>  	*names = reftable_calloc(sizeof(char *) * (N + 1));
>  	reftable_writer_set_limits(w, update_index, update_index);
>  	for (i = 0; i < N; i++) {
> -		uint8_t hash[GIT_SHA256_RAWSZ] = { 0 };
>  		char name[100];
>  		int n;
>  
> -		set_test_hash(hash, i);
> -
>  		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
>  
>  		ref.refname = name;
>  		ref.update_index = update_index;
>  		ref.value_type = REFTABLE_REF_VAL1;
> -		ref.value.val1 = hash;
> +		set_test_hash(ref.value.val1, i);
>  		(*names)[i] = xstrdup(name);
>  
>  		n = reftable_writer_add_ref(w, &ref);
> @@ -675,11 +672,10 @@ static void test_write_object_id_min_length(void)
>  	struct strbuf buf = STRBUF_INIT;
>  	struct reftable_writer *w =
>  		reftable_new_writer(&strbuf_add_void, &buf, &opts);
> -	uint8_t hash[GIT_SHA1_RAWSZ] = {42};
>  	struct reftable_ref_record ref = {
>  		.update_index = 1,
>  		.value_type = REFTABLE_REF_VAL1,
> -		.value.val1 = hash,
> +		.value.val1 = {42},
>  	};
>  	int err;
>  	int i;
> @@ -711,11 +707,10 @@ static void test_write_object_id_length(void)
>  	struct strbuf buf = STRBUF_INIT;
>  	struct reftable_writer *w =
>  		reftable_new_writer(&strbuf_add_void, &buf, &opts);
> -	uint8_t hash[GIT_SHA1_RAWSZ] = {42};
>  	struct reftable_ref_record ref = {
>  		.update_index = 1,
>  		.value_type = REFTABLE_REF_VAL1,
> -		.value.val1 = hash,
> +		.value.val1 = {42},
>  	};
>  	int err;
>  	int i;
> @@ -814,11 +809,10 @@ static void test_write_multiple_indices(void)
>  	writer = reftable_new_writer(&strbuf_add_void, &writer_buf, &opts);
>  	reftable_writer_set_limits(writer, 1, 1);
>  	for (i = 0; i < 100; i++) {
> -		unsigned char hash[GIT_SHA1_RAWSZ] = {i};
>  		struct reftable_ref_record ref = {
>  			.update_index = 1,
>  			.value_type = REFTABLE_REF_VAL1,
> -			.value.val1 = hash,
> +			.value.val1 = {i},
>  		};
>  
>  		strbuf_reset(&buf);
> diff --git a/reftable/record.c b/reftable/record.c
> index 5e258c734b..a67a6b4d8a 100644
> --- a/reftable/record.c
> +++ b/reftable/record.c
> @@ -219,7 +219,6 @@ static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
>  	case REFTABLE_REF_DELETION:
>  		break;
>  	case REFTABLE_REF_VAL1:
> -		ref->value.val1 = reftable_malloc(hash_size);
>  		memcpy(ref->value.val1, src->value.val1, hash_size);
>  		break;
>  	case REFTABLE_REF_VAL2:
> @@ -303,7 +302,6 @@ void reftable_ref_record_release(struct reftable_ref_record *ref)
>  		reftable_free(ref->value.val2.value);
>  		break;
>  	case REFTABLE_REF_VAL1:
> -		reftable_free(ref->value.val1);
>  		break;
>  	case REFTABLE_REF_DELETION:
>  		break;
> @@ -394,7 +392,6 @@ static int reftable_ref_record_decode(void *rec, struct strbuf key,
>  			return -1;
>  		}
>  
> -		r->value.val1 = reftable_malloc(hash_size);
>  		memcpy(r->value.val1, in.buf, hash_size);
>  		string_view_consume(&in, hash_size);
>  		break;
> diff --git a/reftable/record_test.c b/reftable/record_test.c
> index 70ae78feca..5c94d26e35 100644
> --- a/reftable/record_test.c
> +++ b/reftable/record_test.c
> @@ -119,7 +119,6 @@ static void test_reftable_ref_record_roundtrip(void)
>  		case REFTABLE_REF_DELETION:
>  			break;
>  		case REFTABLE_REF_VAL1:
> -			in.u.ref.value.val1 = reftable_malloc(GIT_SHA1_RAWSZ);
>  			set_hash(in.u.ref.value.val1, 1);
>  			break;
>  		case REFTABLE_REF_VAL2:
> diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h
> index f7eb2d6015..7f3a0df635 100644
> --- a/reftable/reftable-record.h
> +++ b/reftable/reftable-record.h
> @@ -9,6 +9,7 @@ license that can be found in the LICENSE file or at
>  #ifndef REFTABLE_RECORD_H
>  #define REFTABLE_RECORD_H
>  
> +#include "hash-ll.h"
>  #include <stdint.h>
>  
>  /*
> @@ -38,7 +39,7 @@ struct reftable_ref_record {
>  #define REFTABLE_NR_REF_VALUETYPES 4
>  	} value_type;
>  	union {
> -		uint8_t *val1; /* malloced hash. */
> +		unsigned char val1[GIT_MAX_RAWSZ];
>  		struct {
>  			uint8_t *value; /* first value, malloced hash  */
>  			uint8_t *target_value; /* second value, malloced hash */
> diff --git a/reftable/stack_test.c b/reftable/stack_test.c
> index 14a3fc11ee..feab49d7f7 100644
> --- a/reftable/stack_test.c
> +++ b/reftable/stack_test.c
> @@ -463,7 +463,6 @@ static void test_reftable_stack_add(void)
>  		refs[i].refname = xstrdup(buf);
>  		refs[i].update_index = i + 1;
>  		refs[i].value_type = REFTABLE_REF_VAL1;
> -		refs[i].value.val1 = reftable_malloc(GIT_SHA1_RAWSZ);
>  		set_test_hash(refs[i].value.val1, i);
>  
>  		logs[i].refname = xstrdup(buf);
> @@ -600,7 +599,6 @@ static void test_reftable_stack_tombstone(void)
>  		refs[i].update_index = i + 1;
>  		if (i % 2 == 0) {
>  			refs[i].value_type = REFTABLE_REF_VAL1;
> -			refs[i].value.val1 = reftable_malloc(GIT_SHA1_RAWSZ);
>  			set_test_hash(refs[i].value.val1, i);
>  		}

^ permalink raw reply

* Re: [PATCH v2 7/8] reftable/merged: really reuse buffers to compute record keys
From: Junio C Hamano @ 2023-12-28 17:03 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <6313f8affdc136b183c1bd411d481efe5c676aee.1703743174.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> In 829231dc20 (reftable/merged: reuse buffer to compute record keys,
> 2023-12-11), we have refactored the merged iterator to reuse a set of
> buffers that it would otherwise have to reallocate on every single
> iteration. Unfortunately, there was a brown-paper-bag-style bug here as
> we continue to release these buffers after the iteration, and thus we
> have essentially gained nothing.

s/-style// perhaps.  It took me more than just reading of the above
but I needed to see the code before noticed that you are talking
about strbuf_release().  Only after that I think I understood what
you meant as a bug.

    With the change, instead of using a new "entry_key" strbuf for
    each iteration, the code now passes mi->entry_key to
    reftable_record_key(), which will reuse the existing .buf member
    of the strbuf to avoid reallcation.  But releasing the strbuf in
    each iteration defeats such optimization.

I suspect that a Git developer who will be reading "git log" output
in 6 months and finds the above paragraph understands the problem
and its fix better if the description hinted strbuf_reset() near
where it mentions "release", something like:

    ... to reuse a pair of long-living strbufs by relying on the
    fact that reftable_record_key() tries to reuse its already
    allocated .buf member by calling strbuf_reset(), which should
    give us significantly fewer reallocation compared to the old
    code that used on-stack strbufs that are allocated for each and
    every iteration.  Unfortunately, we called strbuf_release() on
    these long-living strbufs that we meant to reuse, defeating the
    optimization.

or along that line, perhaps?

Other than that, a very reasonable fix.  Thanks for a pleasant read.

> Fix this performance issue by not releasing those buffers on iteration
> anymore, where we instead rely on `merged_iter_close()` to release the
> buffers for us.
>
> Using `git show-ref --quiet` in a repository with ~350k refs this leads
> to a significant drop in allocations. Before:
>
>     HEAP SUMMARY:
>         in use at exit: 21,163 bytes in 193 blocks
>       total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated
>
> After:
>
>     HEAP SUMMARY:
>         in use at exit: 21,163 bytes in 193 blocks
>       total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  reftable/merged.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/reftable/merged.c b/reftable/merged.c
> index 556bb5c556..a28bb99aaf 100644
> --- a/reftable/merged.c
> +++ b/reftable/merged.c
> @@ -128,8 +128,6 @@ static int merged_iter_next_entry(struct merged_iter *mi,
>  
>  done:
>  	reftable_record_release(&entry.rec);
> -	strbuf_release(&mi->entry_key);
> -	strbuf_release(&mi->key);
>  	return err;
>  }

^ permalink raw reply

* Re: [PATCH] mem-pool: fix big allocations
From: phillip.wood123 @ 2023-12-28 16:48 UTC (permalink / raw)
  To: René Scharfe, phillip.wood, Git List; +Cc: Jameson Miller
In-Reply-To: <c5d35735-10e2-4b71-8fc7-6218e7002549@web.de>

On 28/12/2023 16:05, René Scharfe wrote:
> Am 28.12.23 um 16:10 schrieb Phillip Wood:
>> The diff at the end of
>> this email shows a possible implementation of a check_ptr() macro for
>> the unit test library. I'm wary of adding it though because I'm not sure
>> printing the pointer values is actually very useful most of the
>> time. I'm also concerned that the rules around pointer arithmetic and
>> comparisons mean that many pointer tests such as
>>
>>      check_ptr(pool->mp_block->next_free, <=, pool->mp_block->end);
>>
>> will be undefined if they fail.
> 
> True, the compiler could legally emit mush when it finds out that the
> pointers are for different objects.  And the error being fixed produces
> such unrelated pointer pairs -- oops.
> 
> This check is not important here, we can just drop it.
> 
> mem_pool_contains() has the same problem, by the way.
> 
> Restricting ourselves to only equality comparisons for pointers prevents
> some interesting sanity checks, though.  Casting to intptr_t or
> uintptr_t would allow arbitrary comparisons without risk of undefined
> behavior, though.  Perhaps that would make a check_ptr() macro viable
> and useful.

That certainly helps and the check_ptr() macro in my previous email 
casts the pointers to uintptr_t before comparing them. Maybe I'm 
worrying too much, but my concern is that in a failing comparison it is 
likely one of the pointers is invalid (for example it is the result of 
some undefined pointer arithmetic) and the program is undefined from the 
point the invalid pointer is created. The documentation for check_ptr() 
in my previous mail contains the following example

     For example if `start` and `end` are pointers to the beginning and
     end of an allocation and `offset` is an integer then

         check_ptr(start + offset, <=, end)

     is undefined when `offset` is larger than `end - start`. Rewriting
     the comparison as

         check_uint(offset, <=, end - start)

     avoids undefined behavior when offset is too large, but is still
     undefined if there is a bug that means `start` and `end` do not
     point to the same allocation.

I agree it would be nice to allow arbitrary pointer comparisons but it 
would be good to do it in a way that does not expose us to undefined 
behavior. I'm not sure what the right balance is here.

Best Wishes

Phillip

^ permalink raw reply

* Re: [PATCH] mem-pool: fix big allocations
From: René Scharfe @ 2023-12-28 16:05 UTC (permalink / raw)
  To: phillip.wood, Git List; +Cc: Jameson Miller
In-Reply-To: <9aad15c8-8d3b-475b-bd44-5d24121cb793@gmail.com>

Am 28.12.23 um 16:10 schrieb Phillip Wood:
> Hi René
>
> On 21/12/2023 23:13, René Scharfe wrote:
>> +#define check_ptr(a, op, b) check_int(((a) op (b)), ==, 1)
>> [...]
>> +static void t_calloc_100(struct mem_pool *pool)
>> +{
>> +    size_t size = 100;
>> +    char *buffer = mem_pool_calloc(pool, 1, size);
>> +    for (size_t i = 0; i < size; i++)
>> +        check_int(buffer[i], ==, 0);
>> +    if (!check_ptr(pool->mp_block, !=, NULL))
>> +        return;
>> +    check_ptr(pool->mp_block->next_free, <=, pool->mp_block->end);
>> +    check_ptr(pool->mp_block->next_free, !=, NULL);
>> +    check_ptr(pool->mp_block->end, !=, NULL);
>> +}
>
> It's great to see the unit test framework being used here. I wonder
> though if it would be simpler just to use
>
>     check(ptr != NULL)

Yes, that's better.

> as I'm not sure what the check_ptr() macro adds. The diff at the end of
> this email shows a possible implementation of a check_ptr() macro for
> the unit test library. I'm wary of adding it though because I'm not sure
> printing the pointer values is actually very useful most of the
> time. I'm also concerned that the rules around pointer arithmetic and
> comparisons mean that many pointer tests such as
>
>     check_ptr(pool->mp_block->next_free, <=, pool->mp_block->end);
>
> will be undefined if they fail.

True, the compiler could legally emit mush when it finds out that the
pointers are for different objects.  And the error being fixed produces
such unrelated pointer pairs -- oops.

This check is not important here, we can just drop it.

mem_pool_contains() has the same problem, by the way.

Restricting ourselves to only equality comparisons for pointers prevents
some interesting sanity checks, though.  Casting to intptr_t or
uintptr_t would allow arbitrary comparisons without risk of undefined
behavior, though.  Perhaps that would make a check_ptr() macro viable
and useful.

René

^ permalink raw reply

* Re: [PATCH] Port helper/test-ctype.c to unit-tests/t-ctype.c
From: René Scharfe @ 2023-12-28 16:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christian Couder, Achu Luma, git, Christian Couder, Phillip Wood,
	Josh Steadmon
In-Reply-To: <xmqqcyurky00.fsf@gitster.g>

Am 28.12.23 um 00:48 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>> Also it might not be a big issue here, but when the new unit test
>>> framework was proposed, I commented on the fact that "left" and
>>> "right" were perhaps a bit less explicit than "actual" and "expected".
>>
>> True.
>> ...
>> The added repetition is a bit grating.  With a bit of setup, loop
>> unrolling and stringification you can retain the property of only having
>> to mention the class name once.  Demo patch below.
>
> Nice.
>
> This (and your mempool thing) being one of the early efforts to
> adopt the unit-test framework outside the initial set of sample
> tests, it is understandable that we might find what framework offers
> is still lacking.  But at the same time, while the macro tricks
> demonstrated here are all amusing to read and admire, it feels a bit
> too much to expect that the test writers are willing to invent
> something like these every time they want to test.
>
> Being a relatively faithful conversion of the original ctype tests,
> with its thorough enumeration of test samples and expected output,
> is what makes this test program require these macro tricks, and it
> does not have much to do with the features (or lack thereof) of the
> framework, I guess.

*nod*

>
>> +struct ctype {
>> +	const char *name;
>> +	const char *expect;
>> +	int actual[256];
>> +};
>> +
>> +static void test_ctype(const struct ctype *class)
>> +{
>> +	for (int i = 0; i < 256; i++) {
>> +		int expect = is_in(class->expect, i);
>> +		int actual = class->actual[i];
>> +		int res = test_assert(TEST_LOCATION(), class->name,
>> +				      actual == expect);
>> +		if (!res)
>> +			test_msg("%s classifies char %d (0x%02x) wrongly",
>> +				 class->name, i, i);
>> +	}
>>  }
>
> Somehow, the "test_assert" does not seem to be adding much value
> here (i.e. we can do "res = (actual == expect)" there).  Is this
> because we want to be able to report success, too?
>
>     ... goes and looks at test_assert() ...
>
> Ah, is it because we want to be able to "skip" (which pretends that
> the assert() was satisified).  OK, but then the error reporting from
> it is redundant with our own test_msg().

True, the test_msg() emits the old message here, but it doesn't have to
report that the check failed anymore, because test_assert() already
covers that part.  It would only have to report the misclassified
character and perhaps the expected result.

René

^ permalink raw reply

* Re: Git Rename Detection Bug
From: Philip Oakley @ 2023-12-28 15:33 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Jeremy Pridmore, git@vger.kernel.org, Paul Baumgartner
In-Reply-To: <CABPp-BEdSGBt7DCrJCmOtG+RgZ2F3fNZQJ91PjZQxNa-ShKf8g@mail.gmail.com>

Hi Elijah,
Many thanks.. personal notes in-line.

On 24/12/2023 07:46, Elijah Newren wrote:
> Hi Philip,
> 
> Sorry for the late reply; I somehow missed this earlier.
> 
> On Wed, Nov 15, 2023 at 8:51 AM Philip Oakley <philipoakley@iee.email> wrote:
>>
>> Hi Elijah,
>>
>> On 11/11/2023 05:46, Elijah Newren wrote:
>>> * filename similarity is extraordinarily expensive compared to exact
>>> renames, and if not carefully handled, can sometimes rival the cost of
>>> file content similarity computations given our spanhash
>>> representations.
>>
>> I've not heard of spanhash representation before. Any references or
>> further reading?
> 
> You can find more in diffcore-delta.c, especially the big comment near
> the top of the file.

+1

>         But here's a short explanation of spanhashes:
>   * Split files into chunks delimited either by LF or 64 bytes,
> whichever comes first.

neat


>   * Hash every chunk into an integer between 0 and 107926

as per the comment, this is 1 less than a nice prime 107927 that fits
17bits.
Some discussions at
https://lore.kernel.org/git/7vwtezt202.fsf@assigned-by-dhcp.cox.net/ and
surrounding  messages.

The hash is very similar to a CRC, a rotating 64bit value, using 7 bit
shifts and a 8bit char addition, then reduced to a hash computed at ~#L157

>   * Keep a character count for each of those integers as well (thus if
> a line has N characters, but appears twice in the file, the associated
> count for that integer will be 2N).
>   * A "spanhash" is the combination of the integer that a chunk (or
> span) hashes to, plus the count associated with it.
>   * The list/array of spanhashes for a file (i.e. the list/array of
> integers and character counts) is used to compare one file to another.

I was surprised to see that I'd been in the area at #L162 ;-)

Thank you for the useful summary.


> 
> Now, why do I claim that comparison of filenames can rival cost of
> file content similarity?  Well, in a monorepo of interest, the median
> sized file is named something close to
> "modules/client-resources/src/main/resources/buttons/smallTriangleBlackRight.png"
> and is 2875 bytes.  As a png, all its chunks are probably the full 64
> characters, which works out to about 45 chunks (assuming the 64-byte
> chunks are different from each other).  The filename is 79 characters.
> So, for this case, 45 pairs of integers vs 79 characters.  So, the
> comparison cost is roughly the same order of magnitude.
> (Yes, creating the spanhashes is a heavy overhead; however, we only
> initialize it once and then do N comparisons of each spanhash to the
> other spanhashes.  And we'd be doing N comparisons of each filename to
> other filenames, so the overhead of creating the spanhashes can be
> overlooked if your merge has enough files modified on both sides of
> history.)

Nice point about the hashes only being computed once.

> 
> Yes, this particular repository is a case I randomly picked that you
> can argue is special.  But rather than look at the particular example,
> I think it's interesting to check how the spanhash size vs. filename
> size scale with repository size.  From my experience: (1) I don't
> think the median-sized file varies all that much between small and big
> repositories; every time I check a repo the median size seems to be
> order of a few thousand bytes, regardless of whether the repository
> I'm looking at is tiny or huge, (2) while small repositories often
> have much shorter filenames, big repositories often will have
> filenames even longer than my example; length of filename tends to
> grow with repository size from deep directory nestings.  So, between
> these two facts, I'd expect the filename comparison costs to grow
> relative to file content comparison costs, when considering only
> median-sized files being modified.  And since it's common to have
> merges or rebases or diffs where only approximately-median-sized files
> are involved, I think this is relevant to look at.  Finally, since I
> already had an example that showed the cost likely roughly comparable
> for a random repository of interest, and it's not even all that big a
> repository compared to many out there, I think the combination
> motivates pretty well my claim that filename similarity costs _could_
> rival file content similarity costs if one wasn't careful.
> 
> I don't have a rigorous proof here.  And, in fact, I ended up doing
> this rough back-of-the-envelope analysis _after_ implementing some
> filename similarity comparison ideas and seeing performance degrade
> badly, and wondering why it made such a difference.  I don't know if I
> ever got exact numbers, but I certainly didn't record them.  This
> rough analysis, though, was what made me realize that I needed to be
> careful with any such added filename comparisons, though, and is why
> I'm leery of adding more.

Thanks again.

^ permalink raw reply

* Re: [PATCH] mem-pool: fix big allocations
From: Phillip Wood @ 2023-12-28 15:10 UTC (permalink / raw)
  To: René Scharfe, Git List; +Cc: Jameson Miller
In-Reply-To: <fa89d269-1a23-4ed6-bebc-30c0b629f444@web.de>

Hi René

On 21/12/2023 23:13, René Scharfe wrote:
> +#define check_ptr(a, op, b) check_int(((a) op (b)), ==, 1)
> [...]
> +static void t_calloc_100(struct mem_pool *pool)
> +{
> +	size_t size = 100;
> +	char *buffer = mem_pool_calloc(pool, 1, size);
> +	for (size_t i = 0; i < size; i++)
> +		check_int(buffer[i], ==, 0);
> +	if (!check_ptr(pool->mp_block, !=, NULL))
> +		return;
> +	check_ptr(pool->mp_block->next_free, <=, pool->mp_block->end);
> +	check_ptr(pool->mp_block->next_free, !=, NULL);
> +	check_ptr(pool->mp_block->end, !=, NULL);
> +}

It's great to see the unit test framework being used here. I wonder
though if it would be simpler just to use

	check(ptr != NULL)

as I'm not sure what the check_ptr() macro adds. The diff at the end of
this email shows a possible implementation of a check_ptr() macro for
the unit test library. I'm wary of adding it though because I'm not sure
printing the pointer values is actually very useful most of the
time. I'm also concerned that the rules around pointer arithmetic and
comparisons mean that many pointer tests such as

     check_ptr(pool->mp_block->next_free, <=, pool->mp_block->end);

will be undefined if they fail. The documentation for check_ptr() below
tries to illustrate that concern. If the compiler can prove that a check
is undefined when that check fails it is at liberty to hard code the
test as passing. In practice I think most failing pointer comparisons
would fall into the category of "this is undefined but the compiler
can't prove it" but that doesn't really make me any happier.

Best Wishes

Phillip

---- >8 ----
diff --git a/t/unit-tests/test-lib.h b/t/unit-tests/test-lib.h
index a8f07ae0b7..ecd1fce17d 100644
--- a/t/unit-tests/test-lib.h
+++ b/t/unit-tests/test-lib.h
@@ -99,6 +99,39 @@ int check_int_loc(const char *loc, const char *check, int ok,
  int check_uint_loc(const char *loc, const char *check, int ok,
  		   uintmax_t a, uintmax_t b);
  
+/*
+ * Compare two pointers. Prints a message with the two values if the
+ * comparison fails. NB this is not thread safe.
+ *
+ * Use this with care. The rules around pointer arithmetic and comparison
+ * in C are quite strict and violating them results in undefined behavior
+ * To avoid a failing comparison resulting undefined behavior we compare
+ * the integer value of the pointers. While this avoids undefined
+ * behavior in the comparison in many cases a failing test will be the
+ * result of creating an invalid pointer in a way that violates the
+ * rules on pointer arithmetic. For example if `start` and `end` are
+ * pointers to the beginning and end of an allocation and `offset` is an
+ * integer then
+ *
+ *     check_ptr(start + offset, <=, end)
+ *
+ * is undefined when `offset` is larger than `end - start`. Rewriting the
+ * comparison as
+ *
+ *     check_uint(offset, <=, end - start)
+ *
+ * avoids undefined behavior when offset is too large, but is still
+ * undefined if there is a bug that means `start` and `end` do not point
+ * to the same allocation.
+ */
+#define check_ptr(a, op, b)						\
+	(test__tmp[0].p = (a), test__tmp[1].p = (b),			\
+	 check_ptr_loc(TEST_LOCATION(), #a" "#op" "#b,			\
+		       (uintptr_t)test__tmp[0].p op (uintptr_t)test__tmp[1].p,	\
+			test__tmp[0].p, test__tmp[1].p))
+
+int check_ptr_loc(const char *loc, const char *check, int ok, void *a, void *b);
+
  /*
   * Compare two chars. Prints a message with the two values if the
   * comparison fails. NB this is not thread safe.
@@ -133,6 +166,7 @@ int check_str_loc(const char *loc, const char *check,
  #define TEST__MAKE_LOCATION(line) __FILE__ ":" TEST__STR(line)
  
  union test__tmp {
+	void *p;
  	intmax_t i;
  	uintmax_t u;
  	char c;
diff --git a/t/unit-tests/test-lib.c b/t/unit-tests/test-lib.c
index 7bf9dfdb95..cb757edbd8 100644
--- a/t/unit-tests/test-lib.c
+++ b/t/unit-tests/test-lib.c
@@ -311,6 +311,18 @@ int check_uint_loc(const char *loc, const char *check, int ok,
  	return ret;
  }
  
+int check_ptr_loc(const char *loc, const char *check, int ok, void *a, void *b)
+{
+	int ret = test_assert(loc, check, ok);
+
+	if (!ret) {
+		test_msg("   left: %p", a);
+		test_msg("  right: %p", b);
+	}
+
+	return ret;
+}
+
  static void print_one_char(char ch, char quote)
  {
  	if ((unsigned char)ch < 0x20u || ch == 0x7f) {

^ permalink raw reply related

* Re: Git mirror at gitlab
From: Phillip Wood @ 2023-12-28 14:50 UTC (permalink / raw)
  To: Patrick Steinhardt, Olliver Schinagl
  Cc: git, gitster, Ævar Arnfjörð Bjarmason, psteinhardt
In-Reply-To: <ZY1YYtoIw4dIpZ6Q@tanuki>

On 28/12/2023 11:13, Patrick Steinhardt wrote:
> On Fri, Dec 22, 2023 at 03:06:08PM +0100, Olliver Schinagl wrote:
>> Hey Patrick,
>>
>> On December 21, 2023 12:48:12 p.m. GMT+01:00, Patrick Steinhardt <ps@pks.im> wrote:
>>> On Thu, Dec 21, 2023 at 12:30:02PM +0100, Olliver Schinagl wrote:
>>> Not to throw a wrench into this, but are you aware of the official
>>> GitLab mirror at https://gitlab.com/git-vcs/git? I myself wasn't aware
>>> of this mirror for a rather long time.
>>
>> Not a wrench at all, and no, I didn't know. How old is it though :p
>> could be that git-vcs was created cause I owned gitscm :)
>>
>> I had chosen gitscm to match the official site, git-scm.org. the
>> hyphen I left out because afaik it wasn't allowed on docker hub.
> 
> No idea when exactly this mirror was created. The first mention of it in
> the "What's cooking" report is in December 2020, 489058b80d (What's
> cooking (2023/12 #05), 2023-12-27).

I think Ævar set it up towards the end of 2018 [1]. I think he used it 
to do some testing using the gcc compile farm [2]

Best Wishes

Phillip

[1] https://lore.kernel.org/git/87k1mecj96.fsf@evledraar.gmail.com/
[2] https://lore.kernel.org/git/875zwm15k2.fsf@evledraar.gmail.com/

^ permalink raw reply

* [PATCH 1/1] Replace SID with domain/username
From: Sören Krecker @ 2023-12-28 13:28 UTC (permalink / raw)
  To: git; +Cc: soekkle
In-Reply-To: <20231228132844.4240-1-soekkle@freenet.de>

From: soekkle <soekkle@freenet.de>

Replace SID with domain/username in erromessage, if owner of repository
and user are not equal on windows systems.

Signed-off-by: Sören Krecker <soekkle@freenet.de>
---
 compat/mingw.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 42053c1f65..7318f0e20e 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -2684,6 +2684,25 @@ static PSID get_current_user_sid(void)
 	return result;
 }
 
+BOOL user_sid_to_string(PSID sid, LPSTR* str)
+{
+	SID_NAME_USE peUse;
+	DWORD lenName = { 0 }, lenDomain = { 0 };
+	LookupAccountSidA(NULL, sid, NULL, &lenName, NULL,
+					&lenDomain, &peUse); // returns only FALSE, because the string pointers are NULL
+	ALLOC_ARRAY((*str), (size_t)lenDomain + (size_t)lenName); // Alloc neded Space of the strings
+	BOOL retVal = LookupAccountSidA(NULL, sid, (*str) + lenDomain, &lenName,
+				       *str,
+					&lenDomain, &peUse);
+	*(*str + lenDomain) = '/';
+	if (retVal == FALSE)
+	{
+		free(*str);
+		*str = NULL;
+	}
+	return retVal;
+}
+
 static int acls_supported(const char *path)
 {
 	size_t offset = offset_1st_component(path);
@@ -2767,7 +2786,7 @@ int is_path_owned_by_current_sid(const char *path, struct strbuf *report)
 		} else if (report) {
 			LPSTR str1, str2, to_free1 = NULL, to_free2 = NULL;
 
-			if (ConvertSidToStringSidA(sid, &str1))
+			if (user_sid_to_string(sid, &str1))
 				to_free1 = str1;
 			else
 				str1 = "(inconvertible)";
@@ -2776,7 +2795,7 @@ int is_path_owned_by_current_sid(const char *path, struct strbuf *report)
 				str2 = "(none)";
 			else if (!IsValidSid(current_user_sid))
 				str2 = "(invalid)";
-			else if (ConvertSidToStringSidA(current_user_sid, &str2))
+			else if (user_sid_to_string(current_user_sid, &str2))
 				to_free2 = str2;
 			else
 				str2 = "(inconvertible)";
@@ -2784,8 +2803,8 @@ int is_path_owned_by_current_sid(const char *path, struct strbuf *report)
 				    "'%s' is owned by:\n"
 				    "\t'%s'\nbut the current user is:\n"
 				    "\t'%s'\n", path, str1, str2);
-			LocalFree(to_free1);
-			LocalFree(to_free2);
+			free(to_free1);
+			free(to_free2);
 		}
 	}
 
-- 
2.39.2


^ permalink raw reply related

* [PATCH 0/1] Replace SID with domain/username on Windows
From: Sören Krecker @ 2023-12-28 13:28 UTC (permalink / raw)
  To: git; +Cc: Sören Krecker

*** BLURB HERE ***

soekkle (1):
  Replace SID with domain/username

 compat/mingw.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)


base-commit: e79552d19784ee7f4bbce278fe25f93fbda196fa
-- 
2.39.2


^ permalink raw reply

* Re: [PATCH] sparse-checkout: be consistent with end of options markers
From: Jeff King @ 2023-12-28 11:43 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren via GitGitGadget, git, Randall S. Becker,
	Elijah Newren
In-Reply-To: <xmqqfrzoj31l.fsf@gitster.g>

On Tue, Dec 26, 2023 at 09:18:14AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Sun, Dec 24, 2023 at 01:00:11AM +0000, Elijah Newren via GitGitGadget wrote:
> >
> >> Remove the erroneous PARSE_OPT_KEEP_UNKNOWN_OPT flag now to fix this
> >> bug.  Note that this does mean that anyone who might have been using
> >> [...]
> >
> > Thanks for wrapping this up in patch form. It looks good to me,
> > including the reasoning. You didn't add any tests, but I find it rather
> > unlikely that we'd later regress here.
> 
> Surely.  I am certainly OK with just dropping KEEP_UNKNOWN but I
> would strongly prefer to document what we "fixed" (your "misspelt
> option name" example) and what (mis|ab)use the people may have been
> relying on we have "broken" (the same "misspelt" behaviour that can
> be intentional is now forbidden, and we document that this change in
> behaviour is intentional) with a new test.

Yeah, thank you for talking some sense into me. I do not foresee us
regressing "--sikp-checks", but certainly covering --end-of-options in
more places is worthwhile. As it's handled centrally, it can have
unexpected consequences for various commands.

Elijah's latest version looks good to me.

-Peff

^ permalink raw reply

* Re: Git mirror at gitlab
From: Patrick Steinhardt @ 2023-12-28 11:13 UTC (permalink / raw)
  To: Olliver Schinagl
  Cc: git, gitster, Ævar Arnfjörð Bjarmason, psteinhardt
In-Reply-To: <1D1CDDF3-E0D2-4059-9C9A-796145AB6E24@schinagl.nl>

[-- Attachment #1: Type: text/plain, Size: 3629 bytes --]

On Fri, Dec 22, 2023 at 03:06:08PM +0100, Olliver Schinagl wrote:
> Hey Patrick,
> 
> On December 21, 2023 12:48:12 p.m. GMT+01:00, Patrick Steinhardt <ps@pks.im> wrote:
> >On Thu, Dec 21, 2023 at 12:30:02PM +0100, Olliver Schinagl wrote:
> >> Hey list,
> >> 
> >> For years, I wanted (tried, but time) to run the mirror for git on gitlab.
> >> Actually, the original idea was to run a docker container ([0] 10k+ pulls
> >> :p)
> >> 
> >> I initially set this up via docker build containers, where docker hub would
> >> pull my mirror of the git repo. My mirror, because I added a Dockerfile
> >> which was enough for docker to do its trick. I was planning (time ..) on
> >> submitting this upstream to the list, but never did. Because of me not doing
> >> that, I had to manually (I was even too lazy to script it) rebase the
> >> branch. Docker then did some changes to their business, where the docker
> >> builds where not possible anymore.
> >> 
> >> So then I figured, I'll do the same on gitlab and push it to the docker hub.
> >> Thus I setup a mirror on gitlab [1], with the idea to work there on it.
> >> 
> >> Again, I never got around to finalize this work, mostly because the docker
> >> container 'just worked' for pretty much everything. After all, git is very
> >> stable overal.
> >> 
> >> So very interestingly, last month commit 0e3b67e2aa25edb ("ci: add support
> >> for GitLab CI") landed, which started to trigger pipeline jobs!
> >> 
> >> Sadly, this only worked for 3 builds, as that's when the minutes ran out :)
> >> 
> >> So one, I would very much like to offer the registered names (cause they are
> >> pretty nice in name) to here, so people can use and find it.
> >
> >Not to throw a wrench into this, but are you aware of the official
> >GitLab mirror at https://gitlab.com/git-vcs/git? I myself wasn't aware
> >of this mirror for a rather long time.
> 
> Not a wrench at all, and no, I didn't know. How old is it though :p
> could be that git-vcs was created cause I owned gitscm :)
> 
> I had chosen gitscm to match the official site, git-scm.org. the
> hyphen I left out because afaik it wasn't allowed on docker hub.

No idea when exactly this mirror was created. The first mention of it in
the "What's cooking" report is in December 2020, 489058b80d (What's
cooking (2023/12 #05), 2023-12-27).

> >I also wondered whether we want to have https://gitlab.com/git/git as we
> >do on GitHub. I don't think anybody registered it, but it is blocked
> >from being registered as far as I can tell. Maybe we block the namespace
> >out of caution, I dunno. I can certainly check in with our SREs in case
> >it is something the Git project would like to own.
> 
> Yeah couldn't figure out who it was either ... hence gitscm.
> 
> Sadly gitlab doesn't support aliases :) I'm more then happy to hand
> over the space. Whatever name is decided to be best.

I don't particularly mind the name we ultimately settle with. The most
pragmatic choice would likely be to stick with what we already have.
Ideally, all the others would then be created as forks of the official
one so that they point back to it. Like this, users can learn about the
canonical location of the official GitLab-hosted mirror more easily.

In any case, regardless of which location it is going to be, I certainly
agree that having CI jobs running for it would be great. Folks here at
GitLab are mostly on vacation though, but I'll take care of this once
they are back and once Ævar has responded so that I can get access to
the current official mirror.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH] ci: add job performing static analysis on GitLab CI
From: Patrick Steinhardt @ 2023-12-28 11:02 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2504 bytes --]

Our GitHub Workflows definitions have a static analysis job that
runs the following tasks:

  - Coccinelle to check for suggested refactorings.

  - `make hdr-check` to check for missing includes or forward
    declarations in our header files.

  - `make check-pot` to check our translations for issues.

  - `./ci/check-directional-formatting.bash` to check whether our
    sources contain any Unicode directional formatting code points.

Add an equivalent job to our GitLab CI definitions.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---

This addition is inspired by my reftable patch series where I forgot to
add a "hash-ll.h" include. This would've been catched by our GitLab CI
pipeline if we already had the static-analysis job added in this patch.

These changes are part of [1], which thus also contains an example
pipeline run.

[1]: https://gitlab.com/gitlab-org/git/-/merge_requests/79

 .gitlab-ci.yml                    | 10 ++++++++++
 ci/install-docker-dependencies.sh |  7 ++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index cd98bcb18a..793243421c 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -51,3 +51,13 @@ test:
     paths:
       - t/failed-test-artifacts
     when: on_failure
+
+static-analysis:
+  image: ubuntu:22.04
+  variables:
+    jobname: StaticAnalysis
+  before_script:
+    - ./ci/install-docker-dependencies.sh
+  script:
+    - ./ci/run-static-analysis.sh
+    - ./ci/check-directional-formatting.bash
diff --git a/ci/install-docker-dependencies.sh b/ci/install-docker-dependencies.sh
index 48c43f0f90..eb2c9e1eca 100755
--- a/ci/install-docker-dependencies.sh
+++ b/ci/install-docker-dependencies.sh
@@ -21,7 +21,7 @@ linux-musl)
 		apache2 apache2-http2 apache2-proxy apache2-ssl apache2-webdav apr-util-dbd_sqlite3 \
 		bash cvs gnupg perl-cgi perl-dbd-sqlite >/dev/null
 	;;
-linux-*)
+linux-*|StaticAnalysis)
 	# Required so that apt doesn't wait for user input on certain packages.
 	export DEBIAN_FRONTEND=noninteractive
 
@@ -31,6 +31,11 @@ linux-*)
 		perl-modules liberror-perl libauthen-sasl-perl libemail-valid-perl \
 		libdbd-sqlite3-perl libio-socket-ssl-perl libnet-smtp-ssl-perl ${CC_PACKAGE:-${CC:-gcc}} \
 		apache2 cvs cvsps gnupg libcgi-pm-perl subversion
+
+	if test "$jobname" = StaticAnalysis
+	then
+		apt install -q -y coccinelle
+	fi
 	;;
 pedantic)
 	dnf -yq update >/dev/null &&
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* Re: [PATCH 2/2] ref-filter: support filtering of operational refs
From: Patrick Steinhardt @ 2023-12-28 10:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Karthik Nayak, git, christian.couder
In-Reply-To: <xmqqzfy3l270.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 3250 bytes --]

On Thu, Dec 21, 2023 at 12:40:03PM -0800, Junio C Hamano wrote:
> Karthik Nayak <karthik.188@gmail.com> writes:
> > With the upcoming introduction of the reftable backend, it becomes ever
> > so important to provide the necessary tooling for printing all refs
> > associated with a repository.
> 
> We have pseudoref (those all caps files outside the refs/ hierarchy)
> as an official term defined in the glossary, and Patrick's reftable
> work based on Han-Wen's work revealed the need to treat FETCH_HEAD
> and MERGE_HEAD as "even more pecurilar than pseudorefs" that need
> different term (tentatively called "special refs").  Please avoid
> coming up with yet another random name "operational" without
> discussing.
> 
> With a quick look at the table in this patch, "pseudorefs" appears
> to be the closest word that people are already familiar with, I
> think.

Agreed, this thought also crossed my mind while reading through the
patches.

> A lot more reasonable thing to do may be to scan the
> $GIT_DIR for files whose name satisfy refs.c:is_pseudoref_syntax()
> and list them, instead of having a hardcoded list of these special
> refs.

Agreed, as well. Despite the reasons mentioned below, the chance for
such a hardcoded list to grow stale is also quite high. And while it
certainly feels very hacky to iterate over the files one by one and
check for each of them whether it could be a pseudo ref, it is the best
we can do to dynamically detect any such reference.

One interesting question is how we should treat files that look like a
pseudoref, but which really aren't. I'm not aware of any such files
written by Git itself, but it could certainly be that a user wrote such
a file into the repository manually. But given that we're adding new
behaviour that will be opt-in (e.g. via a new switch) I'd rather err on
the side of caution and mark any such file as broken instead of silently
ignoring them.

> In addition, when reftable and other backends that can
> natively store things outside refs/ hierarchy is in use, they ought
> to know what they have so enumerating these would not be an issue
> for them without having such a hardcoded table of names.

Yup, for the reftable we don't have the issue of "How do we detect refs
dynamically" at all. So I would love for there to be a way to print all
refs in the refdb, regardless of whether they start with `refs/` or look
like a pseudoref or whatever else. Otherwise it wouldn't be possible for
a user to delete anything stored in the refdb that may have a funny
name, be it intentionally, by accident or due to a bug.

In the reftable backend, the ref iterator's `_advance()` function has a
hardcoded `starts_with(refname, "refs/")` check. If false, then we'd
skip the ref in order to retain the same behaviour that the files
backend has. So maybe what we should be doing is to introduce a new flag
`DO_FOR_EACH_ALL_REFS` and expose it via git-for-each-ref(1) or
git-show-ref(1). So:

  - For the reftable backend we'd skip the `starts_with()` check and
    simply return all refs.

  - For the files backend we'd also iterate through all files in
    $GIT_DIR and check whether they are pseudoref-like.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH 6/6] builtin/worktree: create refdb via ref backend
From: Patrick Steinhardt @ 2023-12-28 10:00 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1703754513.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4397 bytes --]

When creating a worktree we create the worktree's ref database manually
by first writing a "HEAD" file so that the directory is recognized as a
Git repository by other commands, and then running git-update-ref(1) or
git-symbolic-ref(1) to write the actual value. But while this is fine
for the files backend, this logic simply assumes too much about how the
ref backend works and will leave behind an invalid ref database once any
other ref backend lands.

Refactor the code to instead use `refs_init_db()` to initialize the ref
database so that git-worktree(1) itself does not need to know about how
to initialize it. This will allow future ref backends to customize how
the per-worktree ref database is set up. Furthermore, as we now already
have a worktree ref store around, we can also avoid spawning external
commands to write the HEAD reference and instead use the refs API to do
so.

Note that we do not have an equivalent to passing the `--quiet` flag to
git-symbolic-ref(1) as we did before. This flag does not have an effect
anyway though, as git-symbolic-ref(1) only honors it when reading a
symref, but never when writing one.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/worktree.c | 48 +++++++++++++++++++++-------------------------
 1 file changed, 22 insertions(+), 26 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 58937a2a68..9d935bee84 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -416,7 +416,6 @@ static int add_worktree(const char *path, const char *refname,
 	struct strbuf sb_git = STRBUF_INIT, sb_repo = STRBUF_INIT;
 	struct strbuf sb = STRBUF_INIT, realpath = STRBUF_INIT;
 	const char *name;
-	struct child_process cp = CHILD_PROCESS_INIT;
 	struct strvec child_env = STRVEC_INIT;
 	unsigned int counter = 0;
 	int len, ret;
@@ -424,7 +423,8 @@ static int add_worktree(const char *path, const char *refname,
 	struct commit *commit = NULL;
 	int is_branch = 0;
 	struct strbuf sb_name = STRBUF_INIT;
-	struct worktree **worktrees;
+	struct worktree **worktrees, *wt = NULL;
+	struct ref_store *wt_refs;
 
 	worktrees = get_worktrees();
 	check_candidate_path(path, opts->force, worktrees, "add");
@@ -500,15 +500,26 @@ static int add_worktree(const char *path, const char *refname,
 	write_file(sb.buf, "../..");
 
 	/*
-	 * This is to keep resolve_ref() happy. We need a valid HEAD
-	 * or is_git_directory() will reject the directory. Any value which
-	 * looks like an object ID will do since it will be immediately
-	 * replaced by the symbolic-ref or update-ref invocation in the new
-	 * worktree.
+	 * Set up the ref store of the worktree and create the HEAD reference.
 	 */
-	strbuf_reset(&sb);
-	strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
-	write_file(sb.buf, "%s", oid_to_hex(null_oid()));
+	wt = get_linked_worktree(name);
+	if (!wt) {
+		ret = error(_("could not find created worktree '%s'"), name);
+		goto done;
+	}
+	wt_refs = get_worktree_ref_store(wt);
+
+	ret = refs_init_db(wt_refs, REFS_INIT_DB_IS_WORKTREE, &sb);
+	if (ret)
+		goto done;
+
+	if (!is_branch && commit)
+		ret = refs_update_ref(wt_refs, NULL, "HEAD", &commit->object.oid,
+				      NULL, 0, UPDATE_REFS_MSG_ON_ERR);
+	else
+		ret = refs_create_symref(wt_refs, "HEAD", symref.buf, NULL);
+	if (ret)
+		goto done;
 
 	/*
 	 * If the current worktree has sparse-checkout enabled, then copy
@@ -527,22 +538,6 @@ static int add_worktree(const char *path, const char *refname,
 
 	strvec_pushf(&child_env, "%s=%s", GIT_DIR_ENVIRONMENT, sb_git.buf);
 	strvec_pushf(&child_env, "%s=%s", GIT_WORK_TREE_ENVIRONMENT, path);
-	cp.git_cmd = 1;
-
-	if (!is_branch && commit) {
-		strvec_pushl(&cp.args, "update-ref", "HEAD",
-			     oid_to_hex(&commit->object.oid), NULL);
-	} else {
-		strvec_pushl(&cp.args, "symbolic-ref", "HEAD",
-			     symref.buf, NULL);
-		if (opts->quiet)
-			strvec_push(&cp.args, "--quiet");
-	}
-
-	strvec_pushv(&cp.env, child_env.v);
-	ret = run_command(&cp);
-	if (ret)
-		goto done;
 
 	if (opts->orphan &&
 	    (ret = make_worktree_orphan(refname, opts, &child_env)))
@@ -588,6 +583,7 @@ static int add_worktree(const char *path, const char *refname,
 	strbuf_release(&sb_git);
 	strbuf_release(&sb_name);
 	strbuf_release(&realpath);
+	free_worktree(wt);
 	return ret;
 }
 
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 5/6] worktree: expose interface to look up worktree by name
From: Patrick Steinhardt @ 2023-12-28 10:00 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1703754513.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2396 bytes --]

Our worktree interfaces do not provide a way to look up a worktree by
its name. Expose `get_linked_worktree()` to allow for this usecase. As
callers are responsible for freeing this worktree, introduce a new
function `free_worktree()` that does so.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 worktree.c | 25 +++++++++++++++----------
 worktree.h | 11 +++++++++++
 2 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/worktree.c b/worktree.c
index a56a6c2a3d..085f2cc41a 100644
--- a/worktree.c
+++ b/worktree.c
@@ -12,18 +12,23 @@
 #include "wt-status.h"
 #include "config.h"
 
+void free_worktree(struct worktree *worktree)
+{
+	if (!worktree)
+		return;
+	free(worktree->path);
+	free(worktree->id);
+	free(worktree->head_ref);
+	free(worktree->lock_reason);
+	free(worktree->prune_reason);
+	free(worktree);
+}
+
 void free_worktrees(struct worktree **worktrees)
 {
 	int i = 0;
-
-	for (i = 0; worktrees[i]; i++) {
-		free(worktrees[i]->path);
-		free(worktrees[i]->id);
-		free(worktrees[i]->head_ref);
-		free(worktrees[i]->lock_reason);
-		free(worktrees[i]->prune_reason);
-		free(worktrees[i]);
-	}
+	for (i = 0; worktrees[i]; i++)
+		free_worktree(worktrees[i]);
 	free (worktrees);
 }
 
@@ -74,7 +79,7 @@ static struct worktree *get_main_worktree(void)
 	return worktree;
 }
 
-static struct worktree *get_linked_worktree(const char *id)
+struct worktree *get_linked_worktree(const char *id)
 {
 	struct worktree *worktree = NULL;
 	struct strbuf path = STRBUF_INIT;
diff --git a/worktree.h b/worktree.h
index ce45b66de9..8a75691eac 100644
--- a/worktree.h
+++ b/worktree.h
@@ -57,6 +57,12 @@ struct worktree *find_worktree(struct worktree **list,
 			       const char *prefix,
 			       const char *arg);
 
+/*
+ * Look up the worktree corresponding to `id`, or NULL of no such worktree
+ * exists.
+ */
+struct worktree *get_linked_worktree(const char *id);
+
 /*
  * Return the worktree corresponding to `path`, or NULL if no such worktree
  * exists.
@@ -134,6 +140,11 @@ void repair_worktrees(worktree_repair_fn, void *cb_data);
  */
 void repair_worktree_at_path(const char *, worktree_repair_fn, void *cb_data);
 
+/*
+ * Free up the memory for a worktree.
+ */
+void free_worktree(struct worktree *);
+
 /*
  * Free up the memory for worktree(s)
  */
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 4/6] builtin/worktree: move setup of commondir file earlier
From: Patrick Steinhardt @ 2023-12-28 10:00 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1703754513.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]

Shuffle around how we create supporting worktree files so that we first
ensure that the worktree has all link files ("gitdir", "commondir")
before we try to initialize the ref database by writing "HEAD". This
will be required by a subsequent commit where we start to initialize the
ref database via `refs_init_db()`, which will require an initialized
`struct worktree *`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/worktree.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 4ac1621541..58937a2a68 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -495,6 +495,10 @@ static int add_worktree(const char *path, const char *refname,
 	strbuf_realpath(&realpath, get_git_common_dir(), 1);
 	write_file(sb_git.buf, "gitdir: %s/worktrees/%s",
 		   realpath.buf, name);
+	strbuf_reset(&sb);
+	strbuf_addf(&sb, "%s/commondir", sb_repo.buf);
+	write_file(sb.buf, "../..");
+
 	/*
 	 * This is to keep resolve_ref() happy. We need a valid HEAD
 	 * or is_git_directory() will reject the directory. Any value which
@@ -505,9 +509,6 @@ static int add_worktree(const char *path, const char *refname,
 	strbuf_reset(&sb);
 	strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
 	write_file(sb.buf, "%s", oid_to_hex(null_oid()));
-	strbuf_reset(&sb);
-	strbuf_addf(&sb, "%s/commondir", sb_repo.buf);
-	write_file(sb.buf, "../..");
 
 	/*
 	 * If the current worktree has sparse-checkout enabled, then copy
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 3/6] refs/files: skip creation of "refs/{heads,tags}" for worktrees
From: Patrick Steinhardt @ 2023-12-28 10:00 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1703754513.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2384 bytes --]

The files ref backend will create both "refs/heads" and "refs/tags" in
the Git directory. While this logic makes sense for normal repositories,
it does not fo worktrees because those refs are "common" refs that would
always be contained in the main repository's ref database.

Introduce a new flag telling the backend that it is expected to create a
per-worktree ref database and skip creation of these dirs in the files
backend when the flag is set. No other backends (currently) need
worktree-specific logic, so this is the only required change to start
creating per-worktree ref databases via `refs_init_db()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.h               |  2 ++
 refs/files-backend.c | 22 ++++++++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/refs.h b/refs.h
index 6090e92578..8ed890841b 100644
--- a/refs.h
+++ b/refs.h
@@ -123,6 +123,8 @@ int should_autocreate_reflog(const char *refname);
 
 int is_branch(const char *refname);
 
+#define REFS_INIT_DB_IS_WORKTREE (1 << 0)
+
 int refs_init_db(struct ref_store *refs, int flags, struct strbuf *err);
 
 /*
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ed47c5dc08..a82d6c453f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3221,7 +3221,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
 }
 
 static int files_init_db(struct ref_store *ref_store,
-			 int flags UNUSED,
+			 int flags,
 			 struct strbuf *err UNUSED)
 {
 	struct files_ref_store *refs =
@@ -3245,15 +3245,21 @@ static int files_init_db(struct ref_store *ref_store,
 	adjust_shared_perm(sb.buf);
 
 	/*
-	 * Create .git/refs/{heads,tags}
+	 * There is no need to create directories for common refs when creating
+	 * a worktree ref store.
 	 */
-	strbuf_reset(&sb);
-	files_ref_path(refs, &sb, "refs/heads");
-	safe_create_dir(sb.buf, 1);
+	if (!(flags & REFS_INIT_DB_IS_WORKTREE)) {
+		/*
+		 * Create .git/refs/{heads,tags}
+		 */
+		strbuf_reset(&sb);
+		files_ref_path(refs, &sb, "refs/heads");
+		safe_create_dir(sb.buf, 1);
 
-	strbuf_reset(&sb);
-	files_ref_path(refs, &sb, "refs/tags");
-	safe_create_dir(sb.buf, 1);
+		strbuf_reset(&sb);
+		files_ref_path(refs, &sb, "refs/tags");
+		safe_create_dir(sb.buf, 1);
+	}
 
 	strbuf_release(&sb);
 	return 0;
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox