* [PATCH v3 8/8] reftable/merged: transfer ownership of records when iterating
From: Patrick Steinhardt @ 2024-01-03 6:22 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1704262787.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 2156 bytes --]
When iterating over records with the merged iterator we put the records
into a priority queue before yielding them to the caller. This means
that we need to allocate the contents of these records before we can
pass them over to the caller.
The handover to the caller is quite inefficient though because we first
deallocate the record passed in by the caller and then copy over the new
record, which requires us to reallocate memory.
Refactor the code to instead transfer ownership of the new record to the
caller. So instead of reallocating all contents, we now release the old
record and then copy contents of the new record into place.
The following benchmark of `git show-ref --quiet` in a repository with
around 350k refs shows a clear improvement. Before:
HEAP SUMMARY:
in use at exit: 21,163 bytes in 193 blocks
total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 21,163 bytes in 193 blocks
total heap usage: 357,007 allocs, 356,814 frees, 24,193,602 bytes allocated
This shows that we now have roundabout a single allocation per record
that we're yielding from the iterator. Ideally, we'd also get rid of
this allocation so that the number of allocations doesn't scale with the
number of refs anymore. This would require some larger surgery though
because the memory is owned by the priority queue before transferring it
over to the caller.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/merged.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/reftable/merged.c b/reftable/merged.c
index a28bb99aaf..a52914d667 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -124,10 +124,12 @@ static int merged_iter_next_entry(struct merged_iter *mi,
reftable_record_release(&top.rec);
}
- reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id));
+ reftable_record_release(rec);
+ *rec = entry.rec;
done:
- reftable_record_release(&entry.rec);
+ if (err)
+ reftable_record_release(&entry.rec);
return err;
}
--
2.43.GIT
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related
* [PATCH v3 7/8] reftable/merged: really reuse buffers to compute record keys
From: Patrick Steinhardt @ 2024-01-03 6:22 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1704262787.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 1689 bytes --]
In 829231dc20 (reftable/merged: reuse buffer to compute record keys,
2023-12-11), we have refactored the merged iterator to reuse a pair of
long-living strbufs by relying on the fact that `reftable_record_key()`
tries to reuse already allocated strbufs by calling `strbuf_reset()`,
which should give us significantly fewer reallocations compared to the
old code that used on-stack strbufs that are allocated for each and
every iteration. Unfortunately, we called `strbuf_release()` on these
long-living strbufs that we meant to reuse on each iteration, defeating
the optimization.
Fix this performance issue by not releasing those buffers on iteration
anymore, where we instead rely on `merged_iter_close()` to release the
buffers for us.
Using `git show-ref --quiet` in a repository with ~350k refs this leads
to a significant drop in allocations. Before:
HEAP SUMMARY:
in use at exit: 21,163 bytes in 193 blocks
total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 21,163 bytes in 193 blocks
total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/merged.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/reftable/merged.c b/reftable/merged.c
index 556bb5c556..a28bb99aaf 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -128,8 +128,6 @@ static int merged_iter_next_entry(struct merged_iter *mi,
done:
reftable_record_release(&entry.rec);
- strbuf_release(&mi->entry_key);
- strbuf_release(&mi->key);
return err;
}
--
2.43.GIT
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related
* [PATCH v3 6/8] reftable/record: store "val2" hashes as static arrays
From: Patrick Steinhardt @ 2024-01-03 6:22 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1704262787.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 5214 bytes --]
Similar to the preceding commit, convert ref records of type "val2" to
store their object IDs in static arrays instead of allocating them for
every single record.
We're using the same benchmark as in the preceding commit, with `git
show-ref --quiet` in a repository with ~350k refs. This time around
though the effects aren't this huge. Before:
HEAP SUMMARY:
in use at exit: 21,163 bytes in 193 blocks
total heap usage: 1,419,040 allocs, 1,418,847 frees, 62,153,868 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 21,163 bytes in 193 blocks
total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated
This is because "val2"-type records are typically only stored for peeled
tags, and the number of annotated tags in the benchmark repository is
rather low. Still, it can be seen that this change leads to a reduction
of allocations overall, even if only a small one.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/readwrite_test.c | 12 ++++--------
reftable/record.c | 6 ------
reftable/record_test.c | 4 ----
reftable/reftable-record.h | 4 ++--
4 files changed, 6 insertions(+), 20 deletions(-)
diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 87b238105c..178766dfa8 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -547,8 +547,6 @@ static void test_table_refs_for(int indexed)
uint8_t hash[GIT_SHA1_RAWSZ];
char fill[51] = { 0 };
char name[100];
- uint8_t hash1[GIT_SHA1_RAWSZ];
- uint8_t hash2[GIT_SHA1_RAWSZ];
struct reftable_ref_record ref = { NULL };
memset(hash, i, sizeof(hash));
@@ -558,11 +556,9 @@ static void test_table_refs_for(int indexed)
name[40] = 0;
ref.refname = name;
- set_test_hash(hash1, i / 4);
- set_test_hash(hash2, 3 + i / 4);
ref.value_type = REFTABLE_REF_VAL2;
- ref.value.val2.value = hash1;
- ref.value.val2.target_value = hash2;
+ set_test_hash(ref.value.val2.value, i / 4);
+ set_test_hash(ref.value.val2.target_value, 3 + i / 4);
/* 80 bytes / entry, so 3 entries per block. Yields 17
*/
@@ -570,8 +566,8 @@ static void test_table_refs_for(int indexed)
n = reftable_writer_add_ref(w, &ref);
EXPECT(n == 0);
- if (!memcmp(hash1, want_hash, GIT_SHA1_RAWSZ) ||
- !memcmp(hash2, want_hash, GIT_SHA1_RAWSZ)) {
+ if (!memcmp(ref.value.val2.value, want_hash, GIT_SHA1_RAWSZ) ||
+ !memcmp(ref.value.val2.target_value, want_hash, GIT_SHA1_RAWSZ)) {
want_names[want_names_len++] = xstrdup(name);
}
}
diff --git a/reftable/record.c b/reftable/record.c
index a67a6b4d8a..5c3fbb7b2a 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -222,9 +222,7 @@ static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
memcpy(ref->value.val1, src->value.val1, hash_size);
break;
case REFTABLE_REF_VAL2:
- ref->value.val2.value = reftable_malloc(hash_size);
memcpy(ref->value.val2.value, src->value.val2.value, hash_size);
- ref->value.val2.target_value = reftable_malloc(hash_size);
memcpy(ref->value.val2.target_value,
src->value.val2.target_value, hash_size);
break;
@@ -298,8 +296,6 @@ void reftable_ref_record_release(struct reftable_ref_record *ref)
reftable_free(ref->value.symref);
break;
case REFTABLE_REF_VAL2:
- reftable_free(ref->value.val2.target_value);
- reftable_free(ref->value.val2.value);
break;
case REFTABLE_REF_VAL1:
break;
@@ -401,11 +397,9 @@ static int reftable_ref_record_decode(void *rec, struct strbuf key,
return -1;
}
- r->value.val2.value = reftable_malloc(hash_size);
memcpy(r->value.val2.value, in.buf, hash_size);
string_view_consume(&in, hash_size);
- r->value.val2.target_value = reftable_malloc(hash_size);
memcpy(r->value.val2.target_value, in.buf, hash_size);
string_view_consume(&in, hash_size);
break;
diff --git a/reftable/record_test.c b/reftable/record_test.c
index 5c94d26e35..2876db7d27 100644
--- a/reftable/record_test.c
+++ b/reftable/record_test.c
@@ -122,11 +122,7 @@ static void test_reftable_ref_record_roundtrip(void)
set_hash(in.u.ref.value.val1, 1);
break;
case REFTABLE_REF_VAL2:
- in.u.ref.value.val2.value =
- reftable_malloc(GIT_SHA1_RAWSZ);
set_hash(in.u.ref.value.val2.value, 1);
- in.u.ref.value.val2.target_value =
- reftable_malloc(GIT_SHA1_RAWSZ);
set_hash(in.u.ref.value.val2.target_value, 2);
break;
case REFTABLE_REF_SYMREF:
diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h
index 7f3a0df635..bb6e99acd3 100644
--- a/reftable/reftable-record.h
+++ b/reftable/reftable-record.h
@@ -41,8 +41,8 @@ struct reftable_ref_record {
union {
unsigned char val1[GIT_MAX_RAWSZ];
struct {
- uint8_t *value; /* first value, malloced hash */
- uint8_t *target_value; /* second value, malloced hash */
+ unsigned char value[GIT_MAX_RAWSZ]; /* first hash */
+ unsigned char target_value[GIT_MAX_RAWSZ]; /* second hash */
} val2;
char *symref; /* referent, malloced 0-terminated string */
} value;
--
2.43.GIT
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related
* [PATCH v3 5/8] reftable/record: store "val1" hashes as static arrays
From: Patrick Steinhardt @ 2024-01-03 6:22 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1704262787.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 8589 bytes --]
When reading ref records of type "val1", we store its object ID in an
allocated array. This results in an additional allocation for every
single ref record we read, which is rather inefficient especially when
iterating over refs.
Refactor the code to instead use an embedded array of `GIT_MAX_RAWSZ`
bytes. While this means that `struct ref_record` is bigger now, we
typically do not store all refs in an array anyway and instead only
handle a limited number of records at the same point in time.
Using `git show-ref --quiet` in a repository with ~350k refs this leads
to a significant drop in allocations. Before:
HEAP SUMMARY:
in use at exit: 21,098 bytes in 192 blocks
total heap usage: 2,116,683 allocs, 2,116,491 frees, 76,098,060 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 21,098 bytes in 192 blocks
total heap usage: 1,419,031 allocs, 1,418,839 frees, 62,145,036 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/block_test.c | 4 +---
reftable/merged_test.c | 16 ++++++----------
reftable/readwrite_test.c | 14 ++++----------
reftable/record.c | 3 ---
reftable/record_test.c | 1 -
reftable/reftable-record.h | 3 ++-
reftable/stack_test.c | 2 --
7 files changed, 13 insertions(+), 30 deletions(-)
diff --git a/reftable/block_test.c b/reftable/block_test.c
index c00bbc8aed..dedb05c7d8 100644
--- a/reftable/block_test.c
+++ b/reftable/block_test.c
@@ -49,13 +49,11 @@ static void test_block_read_write(void)
for (i = 0; i < N; i++) {
char name[100];
- uint8_t hash[GIT_SHA1_RAWSZ];
snprintf(name, sizeof(name), "branch%02d", i);
- memset(hash, i, sizeof(hash));
rec.u.ref.refname = name;
rec.u.ref.value_type = REFTABLE_REF_VAL1;
- rec.u.ref.value.val1 = hash;
+ memset(rec.u.ref.value.val1, i, GIT_SHA1_RAWSZ);
names[i] = xstrdup(name);
n = block_writer_add(&bw, &rec);
diff --git a/reftable/merged_test.c b/reftable/merged_test.c
index d08c16abef..b3927a5d73 100644
--- a/reftable/merged_test.c
+++ b/reftable/merged_test.c
@@ -123,13 +123,11 @@ static void readers_destroy(struct reftable_reader **readers, size_t n)
static void test_merged_between(void)
{
- uint8_t hash1[GIT_SHA1_RAWSZ] = { 1, 2, 3, 0 };
-
struct reftable_ref_record r1[] = { {
.refname = "b",
.update_index = 1,
.value_type = REFTABLE_REF_VAL1,
- .value.val1 = hash1,
+ .value.val1 = { 1, 2, 3, 0 },
} };
struct reftable_ref_record r2[] = { {
.refname = "a",
@@ -165,26 +163,24 @@ static void test_merged_between(void)
static void test_merged(void)
{
- uint8_t hash1[GIT_SHA1_RAWSZ] = { 1 };
- uint8_t hash2[GIT_SHA1_RAWSZ] = { 2 };
struct reftable_ref_record r1[] = {
{
.refname = "a",
.update_index = 1,
.value_type = REFTABLE_REF_VAL1,
- .value.val1 = hash1,
+ .value.val1 = { 1 },
},
{
.refname = "b",
.update_index = 1,
.value_type = REFTABLE_REF_VAL1,
- .value.val1 = hash1,
+ .value.val1 = { 1 },
},
{
.refname = "c",
.update_index = 1,
.value_type = REFTABLE_REF_VAL1,
- .value.val1 = hash1,
+ .value.val1 = { 1 },
}
};
struct reftable_ref_record r2[] = { {
@@ -197,13 +193,13 @@ static void test_merged(void)
.refname = "c",
.update_index = 3,
.value_type = REFTABLE_REF_VAL1,
- .value.val1 = hash2,
+ .value.val1 = { 2 },
},
{
.refname = "d",
.update_index = 3,
.value_type = REFTABLE_REF_VAL1,
- .value.val1 = hash1,
+ .value.val1 = { 1 },
},
};
diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 9c16e0504e..87b238105c 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -60,18 +60,15 @@ static void write_table(char ***names, struct strbuf *buf, int N,
*names = reftable_calloc(sizeof(char *) * (N + 1));
reftable_writer_set_limits(w, update_index, update_index);
for (i = 0; i < N; i++) {
- uint8_t hash[GIT_SHA256_RAWSZ] = { 0 };
char name[100];
int n;
- set_test_hash(hash, i);
-
snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
ref.refname = name;
ref.update_index = update_index;
ref.value_type = REFTABLE_REF_VAL1;
- ref.value.val1 = hash;
+ set_test_hash(ref.value.val1, i);
(*names)[i] = xstrdup(name);
n = reftable_writer_add_ref(w, &ref);
@@ -675,11 +672,10 @@ static void test_write_object_id_min_length(void)
struct strbuf buf = STRBUF_INIT;
struct reftable_writer *w =
reftable_new_writer(&strbuf_add_void, &buf, &opts);
- uint8_t hash[GIT_SHA1_RAWSZ] = {42};
struct reftable_ref_record ref = {
.update_index = 1,
.value_type = REFTABLE_REF_VAL1,
- .value.val1 = hash,
+ .value.val1 = {42},
};
int err;
int i;
@@ -711,11 +707,10 @@ static void test_write_object_id_length(void)
struct strbuf buf = STRBUF_INIT;
struct reftable_writer *w =
reftable_new_writer(&strbuf_add_void, &buf, &opts);
- uint8_t hash[GIT_SHA1_RAWSZ] = {42};
struct reftable_ref_record ref = {
.update_index = 1,
.value_type = REFTABLE_REF_VAL1,
- .value.val1 = hash,
+ .value.val1 = {42},
};
int err;
int i;
@@ -814,11 +809,10 @@ static void test_write_multiple_indices(void)
writer = reftable_new_writer(&strbuf_add_void, &writer_buf, &opts);
reftable_writer_set_limits(writer, 1, 1);
for (i = 0; i < 100; i++) {
- unsigned char hash[GIT_SHA1_RAWSZ] = {i};
struct reftable_ref_record ref = {
.update_index = 1,
.value_type = REFTABLE_REF_VAL1,
- .value.val1 = hash,
+ .value.val1 = {i},
};
strbuf_reset(&buf);
diff --git a/reftable/record.c b/reftable/record.c
index 5e258c734b..a67a6b4d8a 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -219,7 +219,6 @@ static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
case REFTABLE_REF_DELETION:
break;
case REFTABLE_REF_VAL1:
- ref->value.val1 = reftable_malloc(hash_size);
memcpy(ref->value.val1, src->value.val1, hash_size);
break;
case REFTABLE_REF_VAL2:
@@ -303,7 +302,6 @@ void reftable_ref_record_release(struct reftable_ref_record *ref)
reftable_free(ref->value.val2.value);
break;
case REFTABLE_REF_VAL1:
- reftable_free(ref->value.val1);
break;
case REFTABLE_REF_DELETION:
break;
@@ -394,7 +392,6 @@ static int reftable_ref_record_decode(void *rec, struct strbuf key,
return -1;
}
- r->value.val1 = reftable_malloc(hash_size);
memcpy(r->value.val1, in.buf, hash_size);
string_view_consume(&in, hash_size);
break;
diff --git a/reftable/record_test.c b/reftable/record_test.c
index 70ae78feca..5c94d26e35 100644
--- a/reftable/record_test.c
+++ b/reftable/record_test.c
@@ -119,7 +119,6 @@ static void test_reftable_ref_record_roundtrip(void)
case REFTABLE_REF_DELETION:
break;
case REFTABLE_REF_VAL1:
- in.u.ref.value.val1 = reftable_malloc(GIT_SHA1_RAWSZ);
set_hash(in.u.ref.value.val1, 1);
break;
case REFTABLE_REF_VAL2:
diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h
index f7eb2d6015..7f3a0df635 100644
--- a/reftable/reftable-record.h
+++ b/reftable/reftable-record.h
@@ -9,6 +9,7 @@ license that can be found in the LICENSE file or at
#ifndef REFTABLE_RECORD_H
#define REFTABLE_RECORD_H
+#include "hash-ll.h"
#include <stdint.h>
/*
@@ -38,7 +39,7 @@ struct reftable_ref_record {
#define REFTABLE_NR_REF_VALUETYPES 4
} value_type;
union {
- uint8_t *val1; /* malloced hash. */
+ unsigned char val1[GIT_MAX_RAWSZ];
struct {
uint8_t *value; /* first value, malloced hash */
uint8_t *target_value; /* second value, malloced hash */
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 14a3fc11ee..feab49d7f7 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -463,7 +463,6 @@ static void test_reftable_stack_add(void)
refs[i].refname = xstrdup(buf);
refs[i].update_index = i + 1;
refs[i].value_type = REFTABLE_REF_VAL1;
- refs[i].value.val1 = reftable_malloc(GIT_SHA1_RAWSZ);
set_test_hash(refs[i].value.val1, i);
logs[i].refname = xstrdup(buf);
@@ -600,7 +599,6 @@ static void test_reftable_stack_tombstone(void)
refs[i].update_index = i + 1;
if (i % 2 == 0) {
refs[i].value_type = REFTABLE_REF_VAL1;
- refs[i].value.val1 = reftable_malloc(GIT_SHA1_RAWSZ);
set_test_hash(refs[i].value.val1, i);
}
--
2.43.GIT
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related
* [PATCH v3 4/8] reftable/record: constify some parts of the interface
From: Patrick Steinhardt @ 2024-01-03 6:22 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1704262787.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 2714 bytes --]
We're about to convert reftable records to stop storing their object IDs
as allocated hashes. Prepare for this refactoring by constifying some
parts of the interface that will be impacted by this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/record.c | 8 ++++----
reftable/reftable-record.h | 4 ++--
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/reftable/record.c b/reftable/record.c
index fbaa1fbef5..5e258c734b 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -76,7 +76,7 @@ int reftable_is_block_type(uint8_t typ)
return 0;
}
-uint8_t *reftable_ref_record_val1(const struct reftable_ref_record *rec)
+const unsigned char *reftable_ref_record_val1(const struct reftable_ref_record *rec)
{
switch (rec->value_type) {
case REFTABLE_REF_VAL1:
@@ -88,7 +88,7 @@ uint8_t *reftable_ref_record_val1(const struct reftable_ref_record *rec)
}
}
-uint8_t *reftable_ref_record_val2(const struct reftable_ref_record *rec)
+const unsigned char *reftable_ref_record_val2(const struct reftable_ref_record *rec)
{
switch (rec->value_type) {
case REFTABLE_REF_VAL2:
@@ -242,7 +242,7 @@ static char hexdigit(int c)
return 'a' + (c - 10);
}
-static void hex_format(char *dest, uint8_t *src, int hash_size)
+static void hex_format(char *dest, const unsigned char *src, int hash_size)
{
assert(hash_size > 0);
if (src) {
@@ -1164,7 +1164,7 @@ int reftable_record_equal(struct reftable_record *a, struct reftable_record *b,
reftable_record_data(a), reftable_record_data(b), hash_size);
}
-static int hash_equal(uint8_t *a, uint8_t *b, int hash_size)
+static int hash_equal(const unsigned char *a, const unsigned char *b, int hash_size)
{
if (a && b)
return !memcmp(a, b, hash_size);
diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h
index 67104f8fbf..f7eb2d6015 100644
--- a/reftable/reftable-record.h
+++ b/reftable/reftable-record.h
@@ -49,11 +49,11 @@ struct reftable_ref_record {
/* Returns the first hash, or NULL if `rec` is not of type
* REFTABLE_REF_VAL1 or REFTABLE_REF_VAL2. */
-uint8_t *reftable_ref_record_val1(const struct reftable_ref_record *rec);
+const unsigned char *reftable_ref_record_val1(const struct reftable_ref_record *rec);
/* Returns the second hash, or NULL if `rec` is not of type
* REFTABLE_REF_VAL2. */
-uint8_t *reftable_ref_record_val2(const struct reftable_ref_record *rec);
+const unsigned char *reftable_ref_record_val2(const struct reftable_ref_record *rec);
/* returns whether 'ref' represents a deletion */
int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
--
2.43.GIT
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related
* [PATCH v3 3/8] reftable/writer: fix index corruption when writing multiple indices
From: Patrick Steinhardt @ 2024-01-03 6:22 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1704262787.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 5241 bytes --]
Each reftable may contain multiple types of blocks for refs, objects and
reflog records, where each of these may have an index that makes it more
efficient to find the records. It was observed that the index for log
records can become corrupted under certain circumstances, where the
first entry of the index points into the object index instead of to the
log records.
As it turns out, this corruption can occur whenever we write a log index
as well as at least one additional index. Writing records and their index
is basically a two-step process:
1. We write all blocks for the corresponding record. Each block that
gets written is added to a list of blocks to index.
2. Once all blocks were written we finish the section. If at least two
blocks have been added to the list of blocks to index then we will
now write the index for those blocks and flush it, as well.
When we have a very large number of blocks then we may decide to write a
multi-level index, which is why we also keep track of the list of the
index blocks in the same way as we previously kept track of the blocks
to index.
Now when we have finished writing all index blocks we clear the index
and flush the last block to disk. This is done in the wrong order though
because flushing the block to disk will re-add it to the list of blocks
to be indexed. The result is that the next section we are about to write
will have an entry in the list of blocks to index that points to the
last block of the preceding section's index, which will corrupt the log
index.
Fix this corruption by clearing the index after having written the last
block.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/readwrite_test.c | 80 +++++++++++++++++++++++++++++++++++++++
reftable/writer.c | 4 +-
2 files changed, 82 insertions(+), 2 deletions(-)
diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 278663f22d..9c16e0504e 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -798,6 +798,85 @@ static void test_write_key_order(void)
strbuf_release(&buf);
}
+static void test_write_multiple_indices(void)
+{
+ struct reftable_write_options opts = {
+ .block_size = 100,
+ };
+ struct strbuf writer_buf = STRBUF_INIT, buf = STRBUF_INIT;
+ struct reftable_block_source source = { 0 };
+ struct reftable_iterator it = { 0 };
+ const struct reftable_stats *stats;
+ struct reftable_writer *writer;
+ struct reftable_reader *reader;
+ int err, i;
+
+ writer = reftable_new_writer(&strbuf_add_void, &writer_buf, &opts);
+ reftable_writer_set_limits(writer, 1, 1);
+ for (i = 0; i < 100; i++) {
+ unsigned char hash[GIT_SHA1_RAWSZ] = {i};
+ struct reftable_ref_record ref = {
+ .update_index = 1,
+ .value_type = REFTABLE_REF_VAL1,
+ .value.val1 = hash,
+ };
+
+ strbuf_reset(&buf);
+ strbuf_addf(&buf, "refs/heads/%04d", i);
+ ref.refname = buf.buf,
+
+ err = reftable_writer_add_ref(writer, &ref);
+ EXPECT_ERR(err);
+ }
+
+ for (i = 0; i < 100; i++) {
+ unsigned char hash[GIT_SHA1_RAWSZ] = {i};
+ struct reftable_log_record log = {
+ .update_index = 1,
+ .value_type = REFTABLE_LOG_UPDATE,
+ .value.update = {
+ .old_hash = hash,
+ .new_hash = hash,
+ },
+ };
+
+ strbuf_reset(&buf);
+ strbuf_addf(&buf, "refs/heads/%04d", i);
+ log.refname = buf.buf,
+
+ err = reftable_writer_add_log(writer, &log);
+ EXPECT_ERR(err);
+ }
+
+ reftable_writer_close(writer);
+
+ /*
+ * The written data should be sufficiently large to result in indices
+ * for each of the block types.
+ */
+ stats = reftable_writer_stats(writer);
+ EXPECT(stats->ref_stats.index_offset > 0);
+ EXPECT(stats->obj_stats.index_offset > 0);
+ EXPECT(stats->log_stats.index_offset > 0);
+
+ block_source_from_strbuf(&source, &writer_buf);
+ err = reftable_new_reader(&reader, &source, "filename");
+ EXPECT_ERR(err);
+
+ /*
+ * Seeking the log uses the log index now. In case there is any
+ * confusion regarding indices we would notice here.
+ */
+ err = reftable_reader_seek_log(reader, &it, "");
+ EXPECT_ERR(err);
+
+ reftable_iterator_destroy(&it);
+ reftable_writer_free(writer);
+ reftable_reader_free(reader);
+ strbuf_release(&writer_buf);
+ strbuf_release(&buf);
+}
+
static void test_corrupt_table_empty(void)
{
struct strbuf buf = STRBUF_INIT;
@@ -847,5 +926,6 @@ int readwrite_test_main(int argc, const char *argv[])
RUN_TEST(test_log_overflow);
RUN_TEST(test_write_object_id_length);
RUN_TEST(test_write_object_id_min_length);
+ RUN_TEST(test_write_multiple_indices);
return 0;
}
diff --git a/reftable/writer.c b/reftable/writer.c
index 2e322a5683..ee4590e20f 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -432,12 +432,12 @@ static int writer_finish_section(struct reftable_writer *w)
reftable_free(idx);
}
- writer_clear_index(w);
-
err = writer_flush_block(w);
if (err < 0)
return err;
+ writer_clear_index(w);
+
bstats = writer_reftable_block_stats(w, typ);
bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks;
bstats->index_offset = index_start;
--
2.43.GIT
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related
* [PATCH v3 2/8] reftable/stack: do not auto-compact twice in `reftable_stack_add()`
From: Patrick Steinhardt @ 2024-01-03 6:22 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1704262787.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 1171 bytes --]
In 5c086453ff (reftable/stack: perform auto-compaction with
transactional interface, 2023-12-11), we fixed a bug where the
transactional interface to add changes to a reftable stack did not
perform auto-compaction by calling `reftable_stack_auto_compact()` in
`reftable_stack_addition_commit()`. While correct, this change may now
cause us to perform auto-compaction twice in the non-transactional
interface `reftable_stack_add()`:
- It performs auto-compaction by itself.
- It now transitively performs auto-compaction via the transactional
interface.
Remove the first instance so that we only end up doing auto-compaction
once.
Reported-by: Han-Wen Nienhuys <hanwenn@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/stack.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/reftable/stack.c b/reftable/stack.c
index 8729508dc3..7ffeb3ee10 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -425,9 +425,6 @@ int reftable_stack_add(struct reftable_stack *st,
return err;
}
- if (!st->disable_auto_compact)
- return reftable_stack_auto_compact(st);
-
return 0;
}
--
2.43.GIT
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related
* [PATCH v3 1/8] reftable/stack: do not overwrite errors when compacting
From: Patrick Steinhardt @ 2024-01-03 6:22 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1704262787.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 1854 bytes --]
In order to compact multiple stacks we iterate through the merged ref
and log records. When there is any error either when reading the records
from the old merged table or when writing the records to the new table
then we break out of the respective loops. When breaking out of the loop
for the ref records though the error code will be overwritten, which may
cause us to inadvertently skip over bad ref records. In the worst case,
this can lead to a compacted stack that is missing records.
Fix the code by using `goto done` instead so that any potential error
codes are properly returned to the caller.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
reftable/stack.c | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)
diff --git a/reftable/stack.c b/reftable/stack.c
index 16bab82063..8729508dc3 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -801,18 +801,16 @@ static int stack_write_compact(struct reftable_stack *st,
err = 0;
break;
}
- if (err < 0) {
- break;
- }
+ if (err < 0)
+ goto done;
if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
continue;
}
err = reftable_writer_add_ref(wr, &ref);
- if (err < 0) {
- break;
- }
+ if (err < 0)
+ goto done;
entries++;
}
reftable_iterator_destroy(&it);
@@ -827,9 +825,8 @@ static int stack_write_compact(struct reftable_stack *st,
err = 0;
break;
}
- if (err < 0) {
- break;
- }
+ if (err < 0)
+ goto done;
if (first == 0 && reftable_log_record_is_deletion(&log)) {
continue;
}
@@ -845,9 +842,8 @@ static int stack_write_compact(struct reftable_stack *st,
}
err = reftable_writer_add_log(wr, &log);
- if (err < 0) {
- break;
- }
+ if (err < 0)
+ goto done;
entries++;
}
--
2.43.GIT
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply related
* [PATCH v3 0/8] reftable: fixes and optimizations (pt.2)
From: Patrick Steinhardt @ 2024-01-03 6:22 UTC (permalink / raw)
To: git; +Cc: Han-Wen Nienhuys, Junio C Hamano
In-Reply-To: <cover.1703063544.git.ps@pks.im>
[-- Attachment #1: Type: text/plain, Size: 4803 bytes --]
Hi,
this is the third version of my patch series that contains additional
fixes and optimizations for the reftable library.
The only changes in this iteration are improvements to the commit
messages to hopefully make them easier to understand. Thanks Junio for
your suggestions!
Patrick
Patrick Steinhardt (8):
reftable/stack: do not overwrite errors when compacting
reftable/stack: do not auto-compact twice in `reftable_stack_add()`
reftable/writer: fix index corruption when writing multiple indices
reftable/record: constify some parts of the interface
reftable/record: store "val1" hashes as static arrays
reftable/record: store "val2" hashes as static arrays
reftable/merged: really reuse buffers to compute record keys
reftable/merged: transfer ownership of records when iterating
reftable/block_test.c | 4 +-
reftable/merged.c | 8 +--
reftable/merged_test.c | 16 +++---
reftable/readwrite_test.c | 102 +++++++++++++++++++++++++++++++------
reftable/record.c | 17 ++-----
reftable/record_test.c | 5 --
reftable/reftable-record.h | 11 ++--
reftable/stack.c | 23 +++------
reftable/stack_test.c | 2 -
reftable/writer.c | 4 +-
10 files changed, 117 insertions(+), 75 deletions(-)
Range-diff against v2:
1: 22a57020c6 = 1: 1dc8ddf04a reftable/stack: do not overwrite errors when compacting
2: a08f7efc99 = 2: eccc34a4b6 reftable/stack: do not auto-compact twice in `reftable_stack_add()`
3: c00e08d97f = 3: 15e12b8f29 reftable/writer: fix index corruption when writing multiple indices
4: 3416268087 = 4: 017e8943c7 reftable/record: constify some parts of the interface
5: 46ca3a37f8 ! 5: f1d2190489 reftable/record: store "val1" hashes as static arrays
@@ Metadata
## Commit message ##
reftable/record: store "val1" hashes as static arrays
- When reading ref records of type "val1" we store its object ID in an
+ When reading ref records of type "val1", we store its object ID in an
allocated array. This results in an additional allocation for every
single ref record we read, which is rather inefficient especially when
iterating over refs.
- Refactor the code to instead use a static array of `GIT_MAX_RAWSZ`
+ Refactor the code to instead use an embedded array of `GIT_MAX_RAWSZ`
bytes. While this means that `struct ref_record` is bigger now, we
typically do not store all refs in an array anyway and instead only
handle a limited number of records at the same point in time.
6: c8a36917b1 = 6: 6ec02d6775 reftable/record: store "val2" hashes as static arrays
7: 6313f8affd ! 7: 845dec2390 reftable/merged: really reuse buffers to compute record keys
@@ Commit message
reftable/merged: really reuse buffers to compute record keys
In 829231dc20 (reftable/merged: reuse buffer to compute record keys,
- 2023-12-11), we have refactored the merged iterator to reuse a set of
- buffers that it would otherwise have to reallocate on every single
- iteration. Unfortunately, there was a brown-paper-bag-style bug here as
- we continue to release these buffers after the iteration, and thus we
- have essentially gained nothing.
+ 2023-12-11), we have refactored the merged iterator to reuse a pair of
+ long-living strbufs by relying on the fact that `reftable_record_key()`
+ tries to reuse already allocated strbufs by calling `strbuf_reset()`,
+ which should give us significantly fewer reallocations compared to the
+ old code that used on-stack strbufs that are allocated for each and
+ every iteration. Unfortunately, we called `strbuf_release()` on these
+ long-living strbufs that we meant to reuse on each iteration, defeating
+ the optimization.
Fix this performance issue by not releasing those buffers on iteration
anymore, where we instead rely on `merged_iter_close()` to release the
8: 25a3919e58 ! 8: eea0e161ad reftable/merged: transfer ownership of records when iterating
@@ Metadata
## Commit message ##
reftable/merged: transfer ownership of records when iterating
- When iterating over recods with the merged iterator we put the records
+ When iterating over records with the merged iterator we put the records
into a priority queue before yielding them to the caller. This means
that we need to allocate the contents of these records before we can
pass them over to the caller.
base-commit: a26002b62827b89a19b1084bd75d9371d565d03c
--
2.43.GIT
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: ps/refstorage-extension (was: What's cooking in git.git (Jan 2024, #01; Tue, 2))
From: Patrick Steinhardt @ 2024-01-03 5:53 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqq5y0bcjpw.fsf@gitster.g>
[-- Attachment #1: Type: text/plain, Size: 1200 bytes --]
On Tue, Jan 02, 2024 at 05:02:35PM -0800, Junio C Hamano wrote:
[snip]
> * ps/refstorage-extension (2024-01-02) 13 commits
> - t9500: write "extensions.refstorage" into config
> - builtin/clone: introduce `--ref-format=` value flag
> - builtin/init: introduce `--ref-format=` value flag
> - builtin/rev-parse: introduce `--show-ref-format` flag
> - t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar
> - setup: introduce GIT_DEFAULT_REF_FORMAT envvar
> - setup: introduce "extensions.refStorage" extension
> - setup: set repository's formats on init
> - setup: start tracking ref storage format
> - refs: refactor logic to look up storage backends
> - worktree: skip reading HEAD when repairing worktrees
> - t: introduce DEFAULT_REPO_FORMAT prereq
> - Merge branch 'ps/clone-into-reftable-repository' into ps/refstorage-extension
>
> Introduce a new extension "refstorage" so that we can mark a
> repository that uses a non-default ref backend, like reftable.
>
> Will merge to 'next'?
> source: <cover.1703833818.git.ps@pks.im>
It's ready from my point of view, but I'm happy to wait a few days for
people to come back from holidays.
Thanks!
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* [PATCH v2 0/2] doc: bisect: change plural paths to singular pathspec
From: Britton Leo Kerin @ 2024-01-03 4:02 UTC (permalink / raw)
To: git; +Cc: Britton Leo Kerin
In-Reply-To: <a5a8c257-8550-492e-a6fa-e88ee59d4d66@smtp-relay.sendinblue.com>
Britton Leo Kerin (2):
doc: use singular form of repeatable path arg
doc: refer to pathspec instead of path
Documentation/git-bisect.txt | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Range-diff against v1:
1: 90c081dcab ! 1: da40e4736b doc: use singular form of repeatable path arg
@@ Commit message
later document text mentions 'path' arguments, while it doesn't mention
'paths'.
- Signed-off-by: Britton L Kerin <britton.kergin@gmail.com>
+ Signed-off-by: Britton Leo Kerin <britton.kergin@gmail.com>
## Documentation/git-bisect.txt ##
@@ Documentation/git-bisect.txt: The command takes various subcommands, and different options depending
-: ---------- > 2: d932b6d501 doc: refer to pathspec instead of path
--
2.43.0
^ permalink raw reply
* [PATCH v2 1/2] doc: use singular form of repeatable path arg
From: Britton Leo Kerin @ 2024-01-03 4:02 UTC (permalink / raw)
To: git; +Cc: Britton Leo Kerin, Britton Leo Kerin
In-Reply-To: <20240103040207.661413-1-britton.kerin@gmail.com>
This is more correct because the <path>... doc syntax already indicates
that the arg is "array-type". It's how other tools do it. Finally, the
later document text mentions 'path' arguments, while it doesn't mention
'paths'.
Signed-off-by: Britton Leo Kerin <britton.kergin@gmail.com>
---
Documentation/git-bisect.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/git-bisect.txt b/Documentation/git-bisect.txt
index aa02e46224..b798282788 100644
--- a/Documentation/git-bisect.txt
+++ b/Documentation/git-bisect.txt
@@ -17,7 +17,7 @@ The command takes various subcommands, and different options depending
on the subcommand:
git bisect start [--term-(new|bad)=<term-new> --term-(old|good)=<term-old>]
- [--no-checkout] [--first-parent] [<bad> [<good>...]] [--] [<paths>...]
+ [--no-checkout] [--first-parent] [<bad> [<good>...]] [--] [<path>...]
git bisect (bad|new|<term-new>) [<rev>]
git bisect (good|old|<term-old>) [<rev>...]
git bisect terms [--term-good | --term-bad]
--
2.43.0
^ permalink raw reply related
* [PATCH v2 2/2] doc: refer to pathspec instead of path
From: Britton Leo Kerin @ 2024-01-03 4:02 UTC (permalink / raw)
To: git; +Cc: Britton Leo Kerin
In-Reply-To: <20240103040207.661413-1-britton.kerin@gmail.com>
Signed-off-by: Britton Leo Kerin <britton.kerin@gmail.com>
---
Documentation/git-bisect.txt | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/git-bisect.txt b/Documentation/git-bisect.txt
index b798282788..8e01f1d618 100644
--- a/Documentation/git-bisect.txt
+++ b/Documentation/git-bisect.txt
@@ -17,7 +17,7 @@ The command takes various subcommands, and different options depending
on the subcommand:
git bisect start [--term-(new|bad)=<term-new> --term-(old|good)=<term-old>]
- [--no-checkout] [--first-parent] [<bad> [<good>...]] [--] [<path>...]
+ [--no-checkout] [--first-parent] [<bad> [<good>...]] [--] [<pathspec>...]
git bisect (bad|new|<term-new>) [<rev>]
git bisect (good|old|<term-old>) [<rev>...]
git bisect terms [--term-good | --term-bad]
@@ -299,7 +299,7 @@ Cutting down bisection by giving more parameters to bisect start
You can further cut down the number of trials, if you know what part of
the tree is involved in the problem you are tracking down, by specifying
-path parameters when issuing the `bisect start` command:
+pathspec parameters when issuing the `bisect start` command:
------------
$ git bisect start -- arch/i386 include/asm-i386
--
2.43.0
^ permalink raw reply related
* What's cooking in git.git (Jan 2024, #01; Tue, 2)
From: Junio C Hamano @ 2024-01-03 1:02 UTC (permalink / raw)
To: git
Here are the topics that have been cooking in my tree. Commits
prefixed with '+' are in 'next' (being in 'next' is a sign that a
topic is stable enough to be used and are candidate to be in a
future release). Commits prefixed with '-' are only in 'seen', and
aren't considered "accepted" at all and may be annotated with an URL
to a message that raises issues but they are no means exhaustive. A
topic without enough support may be discarded after a long period of
no activity (of course they can be resubmit when new interests
arise).
As the past couple of weeks have been slow, topics that have been
merged to 'next' (and to a lessor extent, to 'master') this season
may have seen less scrutinizing eyes than they deserve, which might
lead bugs and regressions caused by them discovered later, but that
is a regular part of life. Let's see what happens.
Copies of the source code to Git live in many repositories, and the
following is a list of the ones I push into or their mirrors. Some
repositories have only a subset of branches.
With maint, master, next, seen, todo:
git://git.kernel.org/pub/scm/git/git.git/
git://repo.or.cz/alt-git.git/
https://kernel.googlesource.com/pub/scm/git/git/
https://github.com/git/git/
https://gitlab.com/git-vcs/git/
With all the integration branches and topics broken out:
https://github.com/gitster/git/
Even though the preformatted documentation in HTML and man format
are not sources, they are published in these repositories for
convenience (replace "htmldocs" with "manpages" for the manual
pages):
git://git.kernel.org/pub/scm/git/git-htmldocs.git/
https://github.com/gitster/git-htmldocs.git/
Release tarballs are available at:
https://www.kernel.org/pub/software/scm/git/
--------------------------------------------------
[Graduated to 'master']
* jc/orphan-unborn (2023-11-24) 2 commits
(merged to 'next' on 2023-12-21 at 030300487a)
+ orphan/unborn: fix use of 'orphan' in end-user facing messages
+ orphan/unborn: add to the glossary and use them consistently
Doc updates to clarify what an "unborn branch" means.
source: <xmqq4jhb977x.fsf@gitster.g>
* jc/retire-cas-opt-name-constant (2023-12-19) 1 commit
(merged to 'next' on 2023-12-21 at 39ef057c8b)
+ remote.h: retire CAS_OPT_NAME
Code clean-up.
source: <xmqq5y0uc7tq.fsf@gitster.g>
* la/trailer-cleanups (2023-12-20) 3 commits
(merged to 'next' on 2023-12-21 at e26ede5f55)
+ trailer: use offsets for trailer_start/trailer_end
+ trailer: find the end of the log message
+ commit: ignore_non_trailer computes number of bytes to ignore
Code clean-up.
source: <pull.1563.v5.git.1697828495.gitgitgadget@gmail.com>
* ps/pseudo-refs (2023-12-14) 4 commits
(merged to 'next' on 2023-12-21 at 3460e3d667)
+ bisect: consistently write BISECT_EXPECTED_REV via the refdb
+ refs: complete list of special refs
+ refs: propagate errno when reading special refs fails
+ wt-status: read HEAD and ORIG_HEAD via the refdb
Assorted changes around pseudoref handling.
source: <cover.1702560829.git.ps@pks.im>
* rj/status-bisect-while-rebase (2023-10-16) 1 commit
(merged to 'next' on 2023-12-21 at 929a027fb7)
+ status: fix branch shown when not only bisecting
"git status" is taught to show both the branch being bisected and
being rebased when both are in effect at the same time.
cf. <xmqqil76kyov.fsf@gitster.g>
source: <2e24ca9b-9c5f-f4df-b9f8-6574a714dfb2@gmail.com>
* rs/rebase-use-strvec-pushf (2023-12-20) 1 commit
(merged to 'next' on 2023-12-20 at ecb190973c)
+ rebase: use strvec_pushf() for format-patch revisions
Code clean-up.
source: <4ab7431c-6c1b-448c-b4d2-e8b9be0e4eef@web.de>
* sh/completion-with-reftable (2023-12-19) 2 commits
(merged to 'next' on 2023-12-20 at 7957d4aa5b)
+ completion: support pseudoref existence checks for reftables
+ completion: refactor existence checks for pseudorefs
Command line completion script (in contrib/) learned to work better
with the reftable backend.
source: <cover.1703022850.git.stanhu@gmail.com>
--------------------------------------------------
[New Topics]
* cp/sideband-array-index-comment-fix (2023-12-28) 1 commit
- sideband.c: remove redundant 'NEEDSWORK' tag
In-code comment fix.
source: <pull.1625.v4.git.1703750460527.gitgitgadget@gmail.com>
* ps/worktree-refdb-initialization (2023-12-28) 6 commits
- builtin/worktree: create refdb via ref backend
- worktree: expose interface to look up worktree by name
- builtin/worktree: move setup of commondir file earlier
- refs/files: skip creation of "refs/{heads,tags}" for worktrees
- setup: move creation of "refs/" into the files backend
- refs: prepare `refs_init_db()` for initializing worktree refs
Instead of manually creating refs/ hierarchy on disk upon a
creation of a secondary worktree, which is only usable via the
files backend, use the refs API to populate it.
May want to wait until other topics solidify a bit more.
cf. <xmqqedf6gpt8.fsf@gitster.g>
source: <cover.1703754513.git.ps@pks.im>
--------------------------------------------------
[Cooking]
* ml/doc-merge-updates (2023-12-20) 2 commits
(merged to 'next' on 2023-12-28 at 7a6329a219)
+ Documentation/git-merge.txt: use backticks for command wrapping
+ Documentation/git-merge.txt: fix reference to synopsis
Doc update.
Will merge to 'master'.
source: <20231220195342.17590-1-mi.al.lohmann@gmail.com>
* cp/apply-core-filemode (2023-12-26) 3 commits
- apply: code simplification
- apply: correctly reverse patch's pre- and post-image mode bits
- apply: ignore working tree filemode when !core.filemode
"git apply" on a filesystem without filemode support have learned
to take a hint from what is in the index for the path, even when
not working with the "--index" or "--cached" option, when checking
the executable bit match what is required by the preimage in the
patch.
Needs review.
source: <20231226233218.472054-1-gitster@pobox.com>
* jc/archive-list-with-extra-args (2023-12-21) 1 commit
(merged to 'next' on 2023-12-28 at 2d5c20e67f)
+ archive: "--list" does not take further options
"git archive --list extra garbage" silently ignored excess command
line parameters, which has been corrected.
Will merge to 'master'.
source: <xmqqmsu3mnix.fsf@gitster.g>
* jk/t1006-cat-file-objectsize-disk (2023-12-21) 1 commit
(merged to 'next' on 2023-12-28 at d82812e636)
+ t1006: add tests for %(objectsize:disk)
Test update.
Will merge to 'master'.
source: <20231221094722.GA570888@coredump.intra.peff.net>
* js/contributor-docs-updates (2023-12-27) 9 commits
(merged to 'next' on 2024-01-02 at 0e072117cd)
+ SubmittingPatches: hyphenate non-ASCII
+ SubmittingPatches: clarify GitHub artifact format
+ SubmittingPatches: clarify GitHub visual
+ SubmittingPatches: provide tag naming advice
+ SubmittingPatches: update extra tags list
+ SubmittingPatches: discourage new trailers
+ SubmittingPatches: drop ref to "What's in git.git"
+ CodingGuidelines: write punctuation marks
+ CodingGuidelines: move period inside parentheses
Doc update.
Will merge to 'master'.
source: <pull.1623.v3.git.1703739324.gitgitgadget@gmail.com>
* al/unit-test-ctype (2024-01-02) 1 commit
- unit-tests: rewrite t/helper/test-ctype.c as a unit test
Move test-ctype helper to the unit-test framework.
source: <20240101104017.9452-2-ach.lumap@gmail.com>
* bk/bisect-doc-fix (2023-12-27) 1 commit
- doc: use singular form of repeatable path arg
Synopsis fix.
Expecting a reroll.
source: <3d46bca1-96d4-43ba-a912-1f7c76942287@smtp-relay.sendinblue.com>
* en/sparse-checkout-eoo (2023-12-26) 2 commits
(merged to 'next' on 2023-12-28 at 3ddf2ebab6)
+ sparse-checkout: be consistent with end of options markers
+ Merge branch 'jk/end-of-options' into jc/sparse-checkout-set-add-end-of-options
"git sparse-checkout (add|set) --[no-]cone --end-of-options" did
not handle "--end-of-options" correctly after a recent update.
Will merge to 'master'.
source: <pull.1625.v2.git.git.1703619562639.gitgitgadget@gmail.com>
* ja/doc-placeholders-fix (2023-12-26) 2 commits
- doc: enforce placeholders in documentation
- doc: enforce dashes in placeholders
Docfix.
Needs review.
source: <pull.1626.git.1703539287.gitgitgadget@gmail.com>
* jc/sparse-checkout-set-default-fix (2023-12-26) 1 commit
(merged to 'next' on 2023-12-28 at a558eccf8e)
+ sparse-checkout: use default patterns for 'set' only !stdin
"git sparse-checkout set" added default patterns even when the
patterns are being fed from the standard input, which has been
corrected.
Will merge to 'master'.
source: <20231221065925.3234048-3-gitster@pobox.com>
* rs/fast-import-simplify-mempool-allocation (2023-12-26) 1 commit
(merged to 'next' on 2023-12-28 at 16e6dd2a69)
+ fast-import: use mem_pool_calloc()
Code simplification.
Will merge to 'master'.
source: <50c1f410-ca37-4c1c-a28b-3e9fad49f2b4@web.de>
* rs/mem-pool-improvements (2023-12-28) 2 commits
(merged to 'next' on 2023-12-28 at aa03d9441c)
+ mem-pool: simplify alignment calculation
+ mem-pool: fix big allocations
MemPool allocator fixes.
Will merge to 'master'.
source: <fa89d269-1a23-4ed6-bebc-30c0b629f444@web.de>
* ps/refstorage-extension (2024-01-02) 13 commits
- t9500: write "extensions.refstorage" into config
- builtin/clone: introduce `--ref-format=` value flag
- builtin/init: introduce `--ref-format=` value flag
- builtin/rev-parse: introduce `--show-ref-format` flag
- t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar
- setup: introduce GIT_DEFAULT_REF_FORMAT envvar
- setup: introduce "extensions.refStorage" extension
- setup: set repository's formats on init
- setup: start tracking ref storage format
- refs: refactor logic to look up storage backends
- worktree: skip reading HEAD when repairing worktrees
- t: introduce DEFAULT_REPO_FORMAT prereq
- Merge branch 'ps/clone-into-reftable-repository' into ps/refstorage-extension
Introduce a new extension "refstorage" so that we can mark a
repository that uses a non-default ref backend, like reftable.
Will merge to 'next'?
source: <cover.1703833818.git.ps@pks.im>
* ps/reftable-fixes-and-optims (2023-12-28) 9 commits
- reftable/merged: transfer ownership of records when iterating
- reftable/merged: really reuse buffers to compute record keys
- reftable/record: store "val2" hashes as static arrays
- reftable/record: store "val1" hashes as static arrays
- reftable/record: constify some parts of the interface
- reftable/writer: fix index corruption when writing multiple indices
- reftable/stack: do not auto-compact twice in `reftable_stack_add()`
- reftable/stack: do not overwrite errors when compacting
- Merge branch 'ps/reftable-fixes' into ps/reftable-fixes-and-optims
More fixes and optimizations to the reftable backend.
Expecting a (hopefully minor and final) reroll.
cf. <xmqqtto2gswa.fsf@gitster.g>
source: <cover.1703743174.git.ps@pks.im>
* tb/multi-pack-verbatim-reuse (2023-12-14) 26 commits
- t/perf: add performance tests for multi-pack reuse
- pack-bitmap: enable reuse from all bitmapped packs
- pack-objects: allow setting `pack.allowPackReuse` to "single"
- t/test-lib-functions.sh: implement `test_trace2_data` helper
- pack-objects: add tracing for various packfile metrics
- pack-bitmap: prepare to mark objects from multiple packs for reuse
- pack-revindex: implement `midx_pair_to_pack_pos()`
- pack-revindex: factor out `midx_key_to_pack_pos()` helper
- midx: implement `midx_preferred_pack()`
- git-compat-util.h: implement checked size_t to uint32_t conversion
- pack-objects: include number of packs reused in output
- pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse
- pack-objects: prepare `write_reused_pack()` for multi-pack reuse
- pack-objects: pass `bitmapped_pack`'s to pack-reuse functions
- pack-objects: keep track of `pack_start` for each reuse pack
- pack-objects: parameterize pack-reuse routines over a single pack
- pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()`
- pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature
- ewah: implement `bitmap_is_empty()`
- pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions
- midx: implement `midx_locate_pack()`
- midx: implement `BTMP` chunk
- midx: factor out `fill_pack_info()`
- pack-bitmap: plug leak in find_objects()
- pack-bitmap-write: deep-clear the `bb_commit` slab
- pack-objects: free packing_data in more places
Streaming spans of packfile data used to be done only from a
single, primary, pack in a repository with multiple packfiles. It
has been extended to allow reuse from other packfiles, too.
Will merge to 'next'?
cf. <ZXurD1NTZ4TAs7WZ@nand.local>
source: <cover.1702592603.git.me@ttaylorr.com>
* jc/bisect-doc (2023-12-09) 1 commit
- bisect: document "terms" subcommand more fully
Doc update.
Needs review.
source: <xmqqzfyjmk02.fsf@gitster.g>
* en/header-cleanup (2023-12-26) 12 commits
(merged to 'next' on 2023-12-28 at 1ccddc2a10)
+ treewide: remove unnecessary includes in source files
+ treewide: add direct includes currently only pulled in transitively
+ trace2/tr2_tls.h: remove unnecessary include
+ submodule-config.h: remove unnecessary include
+ pkt-line.h: remove unnecessary include
+ line-log.h: remove unnecessary include
+ http.h: remove unnecessary include
+ fsmonitor--daemon.h: remove unnecessary includes
+ blame.h: remove unnecessary includes
+ archive.h: remove unnecessary include
+ treewide: remove unnecessary includes in source files
+ treewide: remove unnecessary includes from header files
Remove unused header "#include".
Will merge to 'master'.
source: <pull.1617.v2.git.1703351700.gitgitgadget@gmail.com>
* jw/builtin-objectmode-attr (2023-12-28) 1 commit
(merged to 'next' on 2024-01-02 at 4c3784b3a1)
+ attr: add builtin objectmode values support
The builtin_objectmode attribute is populated for each path
without adding anything in .gitattributes files, which would be
useful in magic pathspec, e.g., ":(attr:builtin_objectmode=100755)"
to limit to executables.
Will merge to 'master'.
cf. <xmqq5y0ssknj.fsf@gitster.g>
source: <20231116054437.2343549-1-jojwang@google.com>
* tb/merge-tree-write-pack (2023-10-23) 5 commits
- builtin/merge-tree.c: implement support for `--write-pack`
- bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
- bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
- bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types
- bulk-checkin: extract abstract `bulk_checkin_source`
"git merge-tree" learned "--write-pack" to record its result
without creating loose objects.
Broken when an object created during a merge is needed to continue merge
cf. <CABPp-BEfy9VOvimP9==ry_rZXu=metOQ8s=_-XiG_Pdx9c06Ww@mail.gmail.com>
source: <cover.1698101088.git.me@ttaylorr.com>
* tb/pair-chunk-expect (2023-11-10) 8 commits
- midx: read `OOFF` chunk with `pair_chunk_expect()`
- midx: read `OIDL` chunk with `pair_chunk_expect()`
- commit-graph: read `BIDX` chunk with `pair_chunk_expect()`
- commit-graph: read `GDAT` chunk with `pair_chunk_expect()`
- commit-graph: read `CDAT` chunk with `pair_chunk_expect()`
- commit-graph: read `OIDL` chunk with `pair_chunk_expect()`
- chunk-format: introduce `pair_chunk_expect()` helper
- Merge branch 'jk/chunk-bounds-more' into HEAD
Further code clean-up.
Needs review.
source: <cover.1699569246.git.me@ttaylorr.com>
* tb/path-filter-fix (2023-10-18) 17 commits
- bloom: introduce `deinit_bloom_filters()`
- commit-graph: reuse existing Bloom filters where possible
- object.h: fix mis-aligned flag bits table
- commit-graph: drop unnecessary `graph_read_bloom_data_context`
- commit-graph.c: unconditionally load Bloom filters
- bloom: prepare to discard incompatible Bloom filters
- bloom: annotate filters with hash version
- commit-graph: new filter ver. that fixes murmur3
- repo-settings: introduce commitgraph.changedPathsVersion
- t4216: test changed path filters with high bit paths
- t/helper/test-read-graph: implement `bloom-filters` mode
- bloom.h: make `load_bloom_filter_from_graph()` public
- t/helper/test-read-graph.c: extract `dump_graph_info()`
- gitformat-commit-graph: describe version 2 of BDAT
- commit-graph: ensure Bloom filters are read with consistent settings
- revision.c: consult Bloom filters for root commits
- t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
The Bloom filter used for path limited history traversal was broken
on systems whose "char" is unsigned; update the implementation and
bump the format version to 2.
Expecting a reroll.
cf. <20231023202212.GA5470@szeder.dev>
source: <cover.1697653929.git.me@ttaylorr.com>
* ak/color-decorate-symbols (2023-10-23) 7 commits
- log: add color.decorate.pseudoref config variable
- refs: exempt pseudorefs from pattern prefixing
- refs: add pseudorefs array and iteration functions
- log: add color.decorate.ref config variable
- log: add color.decorate.symbol config variable
- log: use designated inits for decoration_colors
- config: restructure color.decorate documentation
A new config for coloring.
Needs review.
source: <20231023221143.72489-1-andy.koppe@gmail.com>
* eb/hash-transition (2023-10-02) 30 commits
- t1016-compatObjectFormat: add tests to verify the conversion between objects
- t1006: test oid compatibility with cat-file
- t1006: rename sha1 to oid
- test-lib: compute the compatibility hash so tests may use it
- builtin/ls-tree: let the oid determine the output algorithm
- object-file: handle compat objects in check_object_signature
- tree-walk: init_tree_desc take an oid to get the hash algorithm
- builtin/cat-file: let the oid determine the output algorithm
- rev-parse: add an --output-object-format parameter
- repository: implement extensions.compatObjectFormat
- object-file: update object_info_extended to reencode objects
- object-file-convert: convert commits that embed signed tags
- object-file-convert: convert commit objects when writing
- object-file-convert: don't leak when converting tag objects
- object-file-convert: convert tag objects when writing
- object-file-convert: add a function to convert trees between algorithms
- object: factor out parse_mode out of fast-import and tree-walk into in object.h
- cache: add a function to read an OID of a specific algorithm
- tag: sign both hashes
- commit: export add_header_signature to support handling signatures on tags
- commit: convert mergetag before computing the signature of a commit
- commit: write commits for both hashes
- object-file: add a compat_oid_in parameter to write_object_file_flags
- object-file: update the loose object map when writing loose objects
- loose: compatibilty short name support
- loose: add a mapping between SHA-1 and SHA-256 for loose objects
- repository: add a compatibility hash algorithm
- object-names: support input of oids in any supported hash
- oid-array: teach oid-array to handle multiple kinds of oids
- object-file-convert: stubs for converting from one object format to another
Teach a repository to work with both SHA-1 and SHA-256 hash algorithms.
Needs review.
source: <878r8l929e.fsf@gmail.froward.int.ebiederm.org>
* jx/remote-archive-over-smart-http (2023-12-14) 4 commits
- archive: support remote archive from stateless transport
- transport-helper: call do_take_over() in connect_helper
- transport-helper: call do_take_over() in process_connect
- transport-helper: no connection restriction in connect_helper
"git archive --remote=<remote>" learned to talk over the smart
http (aka stateless) transport.
Needs review.
source: <cover.1702562879.git.zhiyou.jx@alibaba-inc.com>
* jx/sideband-chomp-newline-fix (2023-12-18) 3 commits
- pkt-line: do not chomp newlines for sideband messages
- pkt-line: memorize sideband fragment in reader
- test-pkt-line: add option parser for unpack-sideband
Sideband demultiplexer fixes.
Will merge to 'next'?
source: <cover.1702823801.git.zhiyou.jx@alibaba-inc.com>
* jc/rerere-cleanup (2023-08-25) 4 commits
- rerere: modernize use of empty strbuf
- rerere: try_merge() should use LL_MERGE_ERROR when it means an error
- rerere: fix comment on handle_file() helper
- rerere: simplify check_one_conflict() helper function
Code clean-up.
Not ready to be reviewed yet.
source: <20230824205456.1231371-1-gitster@pobox.com>
--------------------------------------------------
[Discarded]
* jc/sparse-checkout-eoo (2023-12-21) 5 commits
. sparse-checkout: tighten add_patterns_from_input()
. sparse-checkout: use default patterns for 'set' only !stdin
. SQUASH??? end-of-options test
. sparse-checkout: take care of "--end-of-options" in set/add/check-rules
+ Merge branch 'jk/end-of-options' into jc/sparse-checkout-set-add-end-of-options
"git sparse-checkout (add|set) --[no-]cone --end-of-options" did
not handle "--end-of-options" correctly after a recent update.
Superseded by the en/sparse-checkout-eoo topic.
source: <20231221065925.3234048-1-gitster@pobox.com>
^ permalink raw reply
* Re: [PATCH V4 1/1] Replace SID with domain/username
From: Junio C Hamano @ 2024-01-03 0:43 UTC (permalink / raw)
To: Sören Krecker, Johannes Schindelin; +Cc: git, sunshine
In-Reply-To: <20240102191514.2583-2-soekkle@freenet.de>
Sören Krecker <soekkle@freenet.de> writes:
> Replace SID with domain/username in error message, if owner of repository
> and user are not equal on windows systems. Each user should have a unique
> SID (https://learn.microsoft.com/en-us/windows-server/identity/ad-ds/manage/understand-security-identifiers#what-are-security-identifiers).
That paragraph your URL refers to does say that a SID that is used
for an account will never be reused to identify a different account.
But I am not sure if it means a user will never be assigned more
than one SID (in other words, the reverse is not necessarily true).
The paragraph also mentions that a SID can identify a non-user
entity like a computer account (as opposed to "a user account")---I
do not know what its implications are in the context of this patch,
though.
> This means that domain/username is not a loss of information.
This statement does not (grammatically) make sense, but more
importantly, loss of information may not be a bad thing in this
case. If more than one SIDs are given to a user account and
processes working for that account, these different SIDs may be
translated, by using LookupAccountSidA(), to the same string for a
single user@domain, and it would be an operation that loses
information in that sense.
But if what we *care* about is user@domain between the current
process and the owner of the directory in question being the same
(or not), then such a loss of information is a *good* thing.
So I dunno. Arguing what we care about (is that exact SID equality
between the "owner of the directory" and the "user, which the
current process is working on behalf of", or do we care about the
equality of the "accounts"?) may be a better way to justify this
change, if you ask me.
> +static BOOL user_sid_to_user_name(PSID sid, LPSTR *str)
> +{
> + SID_NAME_USE pe_use;
> + DWORD len_user = 0, len_domain = 0;
> + BOOL translate_sid_to_user;
> +
> + /* returns only FALSE, because the string pointers are NULL*/
> + LookupAccountSidA(NULL, sid, NULL, &len_user, NULL, &len_domain,
> + &pe_use);
> + /*Alloc needed space of the strings*/
> + ALLOC_ARRAY((*str), (size_t)len_domain + (size_t)len_user);
> + translate_sid_to_user = LookupAccountSidA(NULL, sid, (*str) + len_domain, &len_user,
> + *str, &len_domain, &pe_use);
> + if (translate_sid_to_user == FALSE) {
> + FREE_AND_NULL(*str);
> + }
Style: do not enclose a single-statement block inside {}.
> + else
> + (*str)[len_domain] = '/';
> + return translate_sid_to_user;
> +}
> @@ -2767,7 +2788,9 @@ int is_path_owned_by_current_sid(const char *path, struct strbuf *report)
> } else if (report) {
> LPSTR str1, str2, to_free1 = NULL, to_free2 = NULL;
>
> - if (ConvertSidToStringSidA(sid, &str1))
> + if (user_sid_to_user_name(sid, &str1))
> + to_free1 = str1;
> + else if (ConvertSidToStringSidA(sid, &str1))
> to_free1 = str1;
Do these two helper functions return pointers pointing into the same
kind of memory that you can free with the same function? That is ...
> ...
> "'%s' is owned by:\n"
> "\t'%s'\nbut the current user is:\n"
> "\t'%s'\n", path, str1, str2);
> - LocalFree(to_free1);
> - LocalFree(to_free2);
> + free(to_free1);
> + free(to_free2);
... the original code seems to say that the piece of memory we
obtain from ConvertSidToStringSidA() must not be freed by calling
free() but use something special called LocalFree(). I am assuing
that your user_sid_to_user_name() returns a regular piece of memory
that can be freed by calling regular free()? Do we need to keep
track of where we got the memory from and use different function to
free each variable, or something (again I do not do Windows so I'll
defer all of these to Dscho, who is CC'ed this time).
Thanks and a happy new year.
> }
> }
^ permalink raw reply
* Re: [PATCH] sideband.c: replace int with size_t for clarity
From: Taylor Blau @ 2024-01-02 22:31 UTC (permalink / raw)
To: Junio C Hamano
Cc: Torsten Bögershausen, Chandra Pratap via GitGitGadget, git,
Chandra Pratap, Chandra Pratap
In-Reply-To: <xmqqedfejc32.fsf@gitster.g>
On Fri, Dec 22, 2023 at 11:01:37AM -0800, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
>
> Just this part.
>
> > Further down, we read
> > for (i = 0; i < ARRAY_SIZE(keywords); i++) {
> >
> > However, a size of an array can never be negative, so that
> > an unsigned data type is a better choice than a signed.
> > And, arrays can have more elements than an int can address,
> > at least in theory.
> > For a reader it makes more sense, to replace
> > int i;
> > with
> > size_t i;
>
> It is a very good discipline to use size_t to index into an array
> whose size is externally controled (e.g., we slurp what the end user
> or the server on the other end of the connection gave us into a
> piece of memory we allocate) to avoid integer overflows as "int" is
> often narrower than "size_t". But this particular one is a Meh; the
> keywords[] is a small hardcoded array whose size and contents are
> totally under our control.
I certainly agree in theory, though I've always erred on the side of
always using size_t for indexing into arrays, even if they're small. It
removes a potential pitfall if you are working with an
externally-controlled array and happen to forget to use size_t.
But if there is an existing index variable with type "int", and we can
easily validate that it's small, I probably wouldn't bother changing it
if I was editing nearby code.
Thanks,
Taylor
^ permalink raw reply
* Re: [PATCH v2] mem-pool: fix big allocations
From: Taylor Blau @ 2024-01-02 22:29 UTC (permalink / raw)
To: René Scharfe
Cc: Git List, Jameson Miller, Phillip Wood, Elijah Newren,
Junio C Hamano
In-Reply-To: <1c39c0e7-05b2-4726-a90c-f78df4356a41@web.de>
Hi René,
On Thu, Dec 28, 2023 at 08:19:06PM +0100, René Scharfe wrote:
> Changes since v1:
> - simply use check() instead of a custom check_ptr() macro
> - drop unnecessary comparison of next_free and end pointers
Great find and fix. It's nice to see some examples of the testing
framework being used. I'll have to play around with it sometime :-).
Thanks,
Taylor
^ permalink raw reply
* Re: [PATCH] git-clone.txt: document -4 and -6
From: Taylor Blau @ 2024-01-02 22:25 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jakub Wilk, git, Eric Wong
In-Reply-To: <xmqqr0izcrfm.fsf@gitster.g>
On Tue, Jan 02, 2024 at 02:15:57PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Note that the 'clone' and 'fetch' versions for many of these options
> > have different wording. For example, in Documentation/git-clone.txt we
> > have:
> >
> > -j::
> > --jobs=<n>::
> > Number of parallel children to be used for all forms of fetching.
> >
> > Whereas the description in the original fetch-options.txt is more
> > verbose.
>
> Yes, so it will be impossible to unify without changing the
> resulting text. But unless one description is clearly better for
> one subcommand while the other description is also clearly better
> for the other subcommand, we should be able to pick a better one
> that would serve both subcommands, and that way we would improve
> description for one subcommand while keeping the other one the same,
> right?
Right. I meant to illustrate merely that this would probably involve
some word-smithing instead of just pure cut-and-paste, so that it may
not be worth the effort.
Thanks,
Taylor
^ permalink raw reply
* Re: [BUG] Asks for "To" even if "To" already specified in letter
From: Taylor Blau @ 2024-01-02 22:23 UTC (permalink / raw)
To: Askar Safin; +Cc: git
In-Reply-To: <CAPnZJGBbA1VLAdP5mZnbF77-x-87JjU3Ku2MhrQu0SFoJL7ggw@mail.gmail.com>
On Sat, Dec 30, 2023 at 06:20:43AM +0300, Askar Safin wrote:
> Hi. I found a bug. Steps to reproduce:
> - Create file /tmp/m with following text:
> ===
> Subject: subj
> To: example@example.com
>
> text
> ===
> - Send it using command "git send-email /tmp/m"
>
> You will see that git asks for "To". git says: "To whom should the
> emails be sent (if anyone)?"
> I don't like this. git should just use "To" from /tmp/m without asking.
>
> Seen with git 2.43.0.
>
> If I execute "git send-email --to-cmd='#' /tmp/m" or
> "git send-email --to-cmd=':' /tmp/m" or
> "git send-email --to-cmd='true' /tmp/m", then "To" is not asked.
I was going to suggest that you use the `--to-cmd=true` trick. I'm
definitely not an expert in the send-email code, but from a cursory
look, I think that this is do-able.
One thing you could do is read all of the messages ahead of time, parse
their headers, and then extract any "To:" headers that you find. This is
pretty similar to what the pre_process_file() function is already doing.
But this might not be the right approach, since FIFOs can only be read
once, and we already have some logic to handle FIFOs specially (see
3c8d3adeae (send-email: export patch counters in validate environment,
2023-04-14)).
But I think that something much simpler would work, which to avoid
asking for a "To:" value altogether, even if one isn't provided. If you
did something like:
--- 8< ---
diff --git a/git-send-email.perl b/git-send-email.perl
index 821b2b3a13..2941278315 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -1018,13 +1018,6 @@ sub file_declares_8bit_cte {
my $to_whom = __("To whom should the emails be sent (if anyone)?");
my $prompting = 0;
-if (!@initial_to && !defined $to_cmd) {
- my $to = ask("$to_whom ",
- default => "",
- valid_re => qr/\@.*\./, confirm_only => 1);
- push @initial_to, parse_address_line($to) if defined $to; # sanitized/validated later
- $prompting++;
-}
sub expand_aliases {
return map { expand_one_alias($_) } @_;
--- >8 ---
I think that would more or less do the trick. send-email will happily
continue on even without an initial $to value, and validate it later
when it's actually needed.
I'm not familiar enough with the code to know if this is the right
approach, so hopefully some other send-email experts can chime in and
let me know if I'm on the right track ;-).
Thanks,
Taylor
^ permalink raw reply related
* Re: [PATCH] git-clone.txt: document -4 and -6
From: Junio C Hamano @ 2024-01-02 22:15 UTC (permalink / raw)
To: Taylor Blau; +Cc: Jakub Wilk, git, Eric Wong
In-Reply-To: <ZZRzxZNb2Aq+2feW@nand.local>
Taylor Blau <me@ttaylorr.com> writes:
> Note that the 'clone' and 'fetch' versions for many of these options
> have different wording. For example, in Documentation/git-clone.txt we
> have:
>
> -j::
> --jobs=<n>::
> Number of parallel children to be used for all forms of fetching.
>
> Whereas the description in the original fetch-options.txt is more
> verbose.
Yes, so it will be impossible to unify without changing the
resulting text. But unless one description is clearly better for
one subcommand while the other description is also clearly better
for the other subcommand, we should be able to pick a better one
that would serve both subcommands, and that way we would improve
description for one subcommand while keeping the other one the same,
right?
> In fact, the story is even more complicated than that, since even though
> the 'push' builtin would benefit from having a shared source of
> documentation for the --ipv4 and --ipv6 options, 'push' does not have a
> --jobs option.
Sure, it won't be just "write what is shared across all the transfer
commands in a single file and include it from all". The direction
of transfer is a reason why the options may differ, of course, so we
may need to have two (or three) include files if we want to go that
route.
> --progress could be shared, as could --server-option, and the two
> --ipv4/6 options. But the number of nested ifdefs necessary to share the
> other options probably dose not justify the effort to do so.
^ permalink raw reply
* [GSOC][RFC] Heed core.bare from template config file when no command line override given, as a microproject.
From: Ghanshyam Thakkar @ 2024-01-02 22:07 UTC (permalink / raw)
To: git; +Cc: newren, gitster, johannes.schindelin
Hello,
I'm currently an undergrad beginning my journey of contributing to the
Git project. I am seeking feedback on doing "Heed core.bare from
template config file when no command line override given" described
here
https://lore.kernel.org/git/5b39c530f2a0edf3b1492fa13a1132d622a0678e.1684218850.git.gitgitgadget@gmail.com/
by Elijah Newren, as a microproject. I would like to know from the
community, if the complexity and scope of the project is appropriate
for a microproject.
Approach:
As described by Elijah in commit message that fixing this cannot be
done in the create_default_files() function as it occurs too late.
This is because both clone and init have different checks and steps
for working with bare flag, like clone creates a new directory
[name].git and sets the GIT_DIR_ENVIRONMENT to it (when not provided
with an explicit dir name arg), while init sets the
GIT_DIR_ENVIRONMENT to the current working directory (when not
provided with an dir name arg). Also there are other steps like
setting no_checkout in a bare repository in builtin/clone.c. These are
all command specific steps which occur in builtin/clone.c and
builtin/init-db.c ,before we ever hit the TODO comment via
[builtin/clone.c]cmd_clone()->[setup.c]init_db()->
[setup.c]create_default_files().
Therefore, rather than centralizing the code in setup.c and adding a
bunch of if-else statements to handle different command specific
scenarios related to bare option, I propose to add a check for
template file config just after parsing of the flags and args in
builtin/init-db.c and builtin/clone.c.
e.g. in builtin/init-db.c :
static int template_bare_config(const char *var, const char *value,
const struct config_context *ctx, void *cb)
{
if(!strcmp(var,"core.bare")) {
is_bare_repository_cfg = git_config_bool(var, value);
}
return 0;
}
int cmd_init_db(int argc, const char **argv, const char *prefix)
{
...
...
if(is_bare_repository_cfg==-1) {
if(!template_dir)
git_config_get_pathname("init.templateDir",
&template_dir);
if(template_dir) {
const char* template_config_path
= xstrfmt("%s/config",
struct stat st;
if(!stat(template_config_path, &st) &&
!S_ISDIR(st.st_mode)) {
git_config_from_file(template_bare_cfg,
template_config_path, NULL);
}
}
...
...
return init_db(git_dir, real_git_dir, template_dir, hash_algo,
initial_branch, init_shared_repository, flags);
}
I also wanted to know if the global config files should have an effect
in deciding if the repo is bare or not.
Curious to know your thoughts on, if this is the right approach or
does it require doing refactoring to bring all the logic in setup.c.
Based on your feedback, I can quickly send a patch.
p.s.: Apologies for the weird indenting, due to 70 character limit.
consider it just a prototype.
Thanks!
^ permalink raw reply
* Re: [PATCH] git-clone.txt: document -4 and -6
From: Taylor Blau @ 2024-01-02 20:36 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jakub Wilk, git, Eric Wong
In-Reply-To: <ZZRqgmDycyAXCrGZ@nand.local>
On Tue, Jan 02, 2024 at 02:56:50PM -0500, Taylor Blau wrote:
> It turned up the following results:
>
> -a
> --depth
> --shallow-since
> --shallow-exclude
> --no-tags
> --recurse-submodules
> -j, --jobs
> -u, --upload-pack
> -q, --quiet
> -v, --verbose
> --progress
> -o, --server-option
Hmm. I think the story is a little more complicated. Just looking at the
ones that we agree on:
* -j, --jobs
* -u, --upload-pack
* --progress
* -o, --server-option
* --[ipv]4
* --[ipv]6
Note that the 'clone' and 'fetch' versions for many of these options
have different wording. For example, in Documentation/git-clone.txt we
have:
-j::
--jobs=<n>::
Number of parallel children to be used for all forms of fetching.
Whereas the description in the original fetch-options.txt is more
verbose.
In fact, the story is even more complicated than that, since even though
the 'push' builtin would benefit from having a shared source of
documentation for the --ipv4 and --ipv6 options, 'push' does not have a
--jobs option.
'clone', 'fetch', and 'pull' all share an '--upload-pack' option as you
note, though 'push' naturally doesn't (so we would need another ifdef
there, too). But the instances in 'clone', 'fetch', and 'pull' all
differ in their wording (e.g., the 'clone' documentation says "the
repository to clone from ..." but the others say "the repository to
fetch from ...").
--progress could be shared, as could --server-option, and the two
--ipv4/6 options. But the number of nested ifdefs necessary to share the
other options probably dose not justify the effort to do so.
Thanks,
Taylor
^ permalink raw reply
* [RFC PATCH 5/6] completion: recognize but do not complete 'view'
From: Britton Leo Kerin @ 2024-01-02 19:57 UTC (permalink / raw)
To: git; +Cc: Britton Leo Kerin
In-Reply-To: <20240102195744.478503-1-britton.kerin@gmail.com>
Completing it might annoy some existing users by creating completion
ambiguity on 'v' and 'vi' without adding anything useful in terms of
interface discovery/recall (because 'view' is just an alias anyway).
Signed-off-by: Britton Leo Kerin <britton.kerin@gmail.com>
---
contrib/completion/git-completion.bash | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index a09598c5c1..3bb790220a 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1589,13 +1589,16 @@ _git_bisect ()
term_good=`__git bisect terms --term-good`
fi
- local subcommands="start bad new $term_bad good old $term_good terms skip reset visualize replay log run help"
+ # We want to recognize 'view' but not complete it, because it overlaps
+ # with 'visualize' too much and is just an alias for it.
+ local completable_subcommands="start bad new $term_bad good old $term_good terms skip reset visualize replay log run help"
+ local all_subcommands="$completable_subcommands view"
- local subcommand="$(__git_find_on_cmdline "$subcommands")"
+ local subcommand="$(__git_find_on_cmdline "$all_subcommands")"
if [ -z "$subcommand" ]; then
if [ -f "$__git_repo_path"/BISECT_START ]; then
- __gitcomp "$subcommands"
+ __gitcomp "$completable_subcommands"
else
__gitcomp "replay start"
fi
@@ -1613,7 +1616,7 @@ _git_bisect ()
;;
esac
;;
- visualize)
+ visualize|view)
case "$cur" in
-*)
__git_complete_log_opts
--
2.43.0
^ permalink raw reply related
* [RFC PATCH 6/6] completion: add comment
From: Britton Leo Kerin @ 2024-01-02 19:57 UTC (permalink / raw)
To: git; +Cc: Britton Leo Kerin
In-Reply-To: <20240102195744.478503-1-britton.kerin@gmail.com>
Signed-off-by: Britton Leo Kerin <britton.kerin@gmail.com>
---
contrib/completion/git-completion.bash | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 3bb790220a..7f9a626e1b 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1589,8 +1589,14 @@ _git_bisect ()
term_good=`__git bisect terms --term-good`
fi
+ # We will complete any custom terms, but still always complete the
+ # more usual bad/new/good/old because git bisect gives a good error
+ # message if these are given when not in use and that's better than
+ # silent refusal to complete if the user is confused.
+ #
# We want to recognize 'view' but not complete it, because it overlaps
# with 'visualize' too much and is just an alias for it.
+ #
local completable_subcommands="start bad new $term_bad good old $term_good terms skip reset visualize replay log run help"
local all_subcommands="$completable_subcommands view"
--
2.43.0
^ permalink raw reply related
* [RFC PATCH 3/6] completion: move to maintain define-before-use
From: Britton Leo Kerin @ 2024-01-02 19:57 UTC (permalink / raw)
To: git; +Cc: Britton Leo Kerin
In-Reply-To: <20240102195744.478503-1-britton.kerin@gmail.com>
Signed-off-by: Britton Leo Kerin <britton.kerin@gmail.com>
---
contrib/completion/git-completion.bash | 265 ++++++++++++-------------
1 file changed, 132 insertions(+), 133 deletions(-)
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 3472fab514..4940ad3e24 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1445,6 +1445,138 @@ _git_archive ()
__git_complete_file
}
+# Options that go well for log, shortlog and gitk
+__git_log_common_options="
+ --not --all
+ --branches --tags --remotes
+ --first-parent --merges --no-merges
+ --max-count=
+ --max-age= --since= --after=
+ --min-age= --until= --before=
+ --min-parents= --max-parents=
+ --no-min-parents --no-max-parents
+"
+# Options that go well for log and gitk (not shortlog)
+__git_log_gitk_options="
+ --dense --sparse --full-history
+ --simplify-merges --simplify-by-decoration
+ --left-right --notes --no-notes
+"
+# Options that go well for log and shortlog (not gitk)
+__git_log_shortlog_options="
+ --author= --committer= --grep=
+ --all-match --invert-grep
+"
+# Options accepted by log and show
+__git_log_show_options="
+ --diff-merges --diff-merges= --no-diff-merges --dd --remerge-diff
+"
+
+__git_diff_merges_opts="off none on first-parent 1 separate m combined c dense-combined cc remerge r"
+
+__git_log_pretty_formats="oneline short medium full fuller reference email raw format: tformat: mboxrd"
+__git_log_date_formats="relative iso8601 iso8601-strict rfc2822 short local default human raw unix auto: format:"
+
+# Find only porcelain (i.e. not git-rev-list) option (not argument) and
+# selected option argument completions for git-log options and put them in
+# COMPREPLY.
+__git_complete_log_opts ()
+{
+ local merge=""
+ if [ -f "$__git_repo_path/MERGE_HEAD" ]; then
+ merge="--merge"
+ fi
+ case "$prev,$cur" in
+ -L,:*:*)
+ return # fall back to Bash filename completion
+ ;;
+ -L,:*)
+ __git_complete_symbol --cur="${cur#:}" --sfx=":"
+ return
+ ;;
+ -G,*|-S,*)
+ __git_complete_symbol
+ return
+ ;;
+ esac
+ case "$cur" in
+ --pretty=*|--format=*)
+ __gitcomp "$__git_log_pretty_formats $(__git_pretty_aliases)
+ " "" "${cur#*=}"
+ return
+ ;;
+ --date=*)
+ __gitcomp "$__git_log_date_formats" "" "${cur##--date=}"
+ return
+ ;;
+ --decorate=*)
+ __gitcomp "full short no" "" "${cur##--decorate=}"
+ return
+ ;;
+ --diff-algorithm=*)
+ __gitcomp "$__git_diff_algorithms" "" "${cur##--diff-algorithm=}"
+ return
+ ;;
+ --submodule=*)
+ __gitcomp "$__git_diff_submodule_formats" "" "${cur##--submodule=}"
+ return
+ ;;
+ --ws-error-highlight=*)
+ __gitcomp "$__git_ws_error_highlight_opts" "" "${cur##--ws-error-highlight=}"
+ return
+ ;;
+ --no-walk=*)
+ __gitcomp "sorted unsorted" "" "${cur##--no-walk=}"
+ return
+ ;;
+ --diff-merges=*)
+ __gitcomp "$__git_diff_merges_opts" "" "${cur##--diff-merges=}"
+ return
+ ;;
+ --*)
+ __gitcomp "
+ $__git_log_common_options
+ $__git_log_shortlog_options
+ $__git_log_gitk_options
+ $__git_log_show_options
+ --root --topo-order --date-order --reverse
+ --follow --full-diff
+ --abbrev-commit --no-abbrev-commit --abbrev=
+ --relative-date --date=
+ --pretty= --format= --oneline
+ --show-signature
+ --cherry-mark
+ --cherry-pick
+ --graph
+ --decorate --decorate= --no-decorate
+ --walk-reflogs
+ --no-walk --no-walk= --do-walk
+ --parents --children
+ --expand-tabs --expand-tabs= --no-expand-tabs
+ $merge
+ $__git_diff_common_options
+ "
+ return
+ ;;
+ -L:*:*)
+ return # fall back to Bash filename completion
+ ;;
+ -L:*)
+ __git_complete_symbol --cur="${cur#-L:}" --sfx=":"
+ return
+ ;;
+ -G*)
+ __git_complete_symbol --pfx="-G" --cur="${cur#-G}"
+ return
+ ;;
+ -S*)
+ __git_complete_symbol --pfx="-S" --cur="${cur#-S}"
+ return
+ ;;
+ esac
+
+}
+
_git_bisect ()
{
__git_has_doubledash && return
@@ -2052,139 +2184,6 @@ _git_ls_tree ()
__git_complete_file
}
-# Options that go well for log, shortlog and gitk
-__git_log_common_options="
- --not --all
- --branches --tags --remotes
- --first-parent --merges --no-merges
- --max-count=
- --max-age= --since= --after=
- --min-age= --until= --before=
- --min-parents= --max-parents=
- --no-min-parents --no-max-parents
-"
-# Options that go well for log and gitk (not shortlog)
-__git_log_gitk_options="
- --dense --sparse --full-history
- --simplify-merges --simplify-by-decoration
- --left-right --notes --no-notes
-"
-# Options that go well for log and shortlog (not gitk)
-__git_log_shortlog_options="
- --author= --committer= --grep=
- --all-match --invert-grep
-"
-# Options accepted by log and show
-__git_log_show_options="
- --diff-merges --diff-merges= --no-diff-merges --dd --remerge-diff
-"
-
-__git_diff_merges_opts="off none on first-parent 1 separate m combined c dense-combined cc remerge r"
-
-__git_log_pretty_formats="oneline short medium full fuller reference email raw format: tformat: mboxrd"
-__git_log_date_formats="relative iso8601 iso8601-strict rfc2822 short local default human raw unix auto: format:"
-
-
-# Find only porcelain (i.e. not git-rev-list) option (not argument) and
-# selected option argument completions for git-log options and put them in
-# COMPREPLY.
-__git_complete_log_opts ()
-{
- local merge=""
- if [ -f "$__git_repo_path/MERGE_HEAD" ]; then
- merge="--merge"
- fi
- case "$prev,$cur" in
- -L,:*:*)
- return # fall back to Bash filename completion
- ;;
- -L,:*)
- __git_complete_symbol --cur="${cur#:}" --sfx=":"
- return
- ;;
- -G,*|-S,*)
- __git_complete_symbol
- return
- ;;
- esac
- case "$cur" in
- --pretty=*|--format=*)
- __gitcomp "$__git_log_pretty_formats $(__git_pretty_aliases)
- " "" "${cur#*=}"
- return
- ;;
- --date=*)
- __gitcomp "$__git_log_date_formats" "" "${cur##--date=}"
- return
- ;;
- --decorate=*)
- __gitcomp "full short no" "" "${cur##--decorate=}"
- return
- ;;
- --diff-algorithm=*)
- __gitcomp "$__git_diff_algorithms" "" "${cur##--diff-algorithm=}"
- return
- ;;
- --submodule=*)
- __gitcomp "$__git_diff_submodule_formats" "" "${cur##--submodule=}"
- return
- ;;
- --ws-error-highlight=*)
- __gitcomp "$__git_ws_error_highlight_opts" "" "${cur##--ws-error-highlight=}"
- return
- ;;
- --no-walk=*)
- __gitcomp "sorted unsorted" "" "${cur##--no-walk=}"
- return
- ;;
- --diff-merges=*)
- __gitcomp "$__git_diff_merges_opts" "" "${cur##--diff-merges=}"
- return
- ;;
- --*)
- __gitcomp "
- $__git_log_common_options
- $__git_log_shortlog_options
- $__git_log_gitk_options
- $__git_log_show_options
- --root --topo-order --date-order --reverse
- --follow --full-diff
- --abbrev-commit --no-abbrev-commit --abbrev=
- --relative-date --date=
- --pretty= --format= --oneline
- --show-signature
- --cherry-mark
- --cherry-pick
- --graph
- --decorate --decorate= --no-decorate
- --walk-reflogs
- --no-walk --no-walk= --do-walk
- --parents --children
- --expand-tabs --expand-tabs= --no-expand-tabs
- $merge
- $__git_diff_common_options
- "
- return
- ;;
- -L:*:*)
- return # fall back to Bash filename completion
- ;;
- -L:*)
- __git_complete_symbol --cur="${cur#-L:}" --sfx=":"
- return
- ;;
- -G*)
- __git_complete_symbol --pfx="-G" --cur="${cur#-G}"
- return
- ;;
- -S*)
- __git_complete_symbol --pfx="-S" --cur="${cur#-S}"
- return
- ;;
- esac
-
-}
-
_git_log ()
{
__git_has_doubledash && return
--
2.43.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox