Git development

Git development
 help / color / mirror / Atom feed

* [PATCH v2 0/7] reftable: improve ref iteration performance
From: Patrick Steinhardt @ 2024-02-12  8:32 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, John Cai
In-Reply-To: <cover.1706782841.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4243 bytes --]

Hi,

this is the second version of my patch series that improves ref
iteration performance. There are no code changes compared to v1, but a
few improvements to the commit messages.

Thanks!

Patrick

Patrick Steinhardt (7):
  reftable/record: introduce function to compare records by key
  reftable/merged: allocation-less dropping of shadowed records
  reftable/merged: skip comparison for records of the same subiter
  reftable/pq: allocation-less comparison of entry keys
  reftable/block: swap buffers instead of copying
  reftable/record: don't try to reallocate ref record name
  reftable/reader: add comments to `table_iter_next()`

 reftable/block.c  |  3 +--
 reftable/merged.c | 19 +++++++-------
 reftable/merged.h |  2 --
 reftable/pq.c     | 13 +--------
 reftable/reader.c | 26 +++++++++++-------
 reftable/record.c | 67 ++++++++++++++++++++++++++++++++++++++++++++---
 reftable/record.h |  7 +++++
 7 files changed, 100 insertions(+), 37 deletions(-)

Range-diff against v1:
1:  fadabec696 ! 1:  bcdb5a2bf0 reftable/record: introduce function to compare records by key
    @@ Commit message
         In some places we need to sort reftable records by their keys to
         determine their ordering. This is done by first formatting the keys into
         a `struct strbuf` and then using `strbuf_cmp()` to compare them. This
    -    logic is needlessly roundabout and can end up costing quite a bit fo CPU
    +    logic is needlessly roundabout and can end up costing quite a bit of CPU
         cycles, both due to the allocation and formatting logic.
     
    -    Introduce a new `reftable_record_cmp()` function that knows to compare
    -    two records with each other without requiring allocations.
    +    Introduce a new `reftable_record_cmp()` function that knows how to
    +    compare two records with each other without requiring allocations.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
2:  576a96f2e5 = 2:  2364a0fa33 reftable/merged: allocation-less dropping of shadowed records
3:  0ca86eba71 ! 3:  205e6b488b reftable/merged: skip comparison for records of the same subiter
    @@ Commit message
         to return.
     
         There is an edge case here where we can skip that comparison: when the
    -    record in the priority queue comes from the same subiterator than the
    +    record in the priority queue comes from the same subiterator as the
         record we are about to return then we know that its key must be larger
         than the key of the record we are about to return. This property is
         guaranteed by the sub-iterators, and if it didn't hold then the whole
4:  1c9c19a3b3 = 4:  fd09ba70fe reftable/pq: allocation-less comparison of entry keys
5:  ac3d43c001 = 5:  2317aa43b9 reftable/block: swap buffers instead of copying
6:  41dff8731c = 6:  8c67491504 reftable/record: don't try to reallocate ref record name
7:  2f1f1dd95e ! 7:  167f67fad8 reftable/reader: add comments to `table_iter_next()`
    @@ Commit message
         more obvious. While at it, touch up the code to conform to our code
         style better.
     
    +    Note that one of the refactorings merges two conditional blocks into
    +    one. Before, we had the following code:
    +
    +    ```
    +    err = table_iter_next_block(&next, ti
    +    if (err != 0) {
    +            ti->is_finished = 1;
    +    }
    +    table_iter_block_done(ti);
    +    if (err != 0) {
    +            return err;
    +    }
    +    ```
    +
    +    As `table_iter_block_done()` does not care about `is_finished`, the
    +    conditional blocks can be merged into one block:
    +
    +    ```
    +    err = table_iter_next_block(&next, ti
    +    table_iter_block_done(ti);
    +    if (err != 0) {
    +            ti->is_finished = 1;
    +            return err;
    +    }
    +    ```
    +
    +    This is both easier to reason about and more performant because we have
    +    one branch less.
    +
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## reftable/reader.c ##

base-commit: c875e0b8e036c12cfbf6531962108a063c7a821c
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v2 1/7] reftable/record: introduce function to compare records by key
From: Patrick Steinhardt @ 2024-02-12  8:32 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, John Cai
In-Reply-To: <cover.1707726654.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 6455 bytes --]

In some places we need to sort reftable records by their keys to
determine their ordering. This is done by first formatting the keys into
a `struct strbuf` and then using `strbuf_cmp()` to compare them. This
logic is needlessly roundabout and can end up costing quite a bit of CPU
cycles, both due to the allocation and formatting logic.

Introduce a new `reftable_record_cmp()` function that knows how to
compare two records with each other without requiring allocations.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/record.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++-
 reftable/record.h |  7 ++++++
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/reftable/record.c b/reftable/record.c
index 5c3fbb7b2a..f1b6a5eac9 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -430,7 +430,6 @@ static int reftable_ref_record_is_deletion_void(const void *p)
 		(const struct reftable_ref_record *)p);
 }
 
-
 static int reftable_ref_record_equal_void(const void *a,
 					  const void *b, int hash_size)
 {
@@ -439,6 +438,13 @@ static int reftable_ref_record_equal_void(const void *a,
 	return reftable_ref_record_equal(ra, rb, hash_size);
 }
 
+static int reftable_ref_record_cmp_void(const void *_a, const void *_b)
+{
+	const struct reftable_ref_record *a = _a;
+	const struct reftable_ref_record *b = _b;
+	return strcmp(a->refname, b->refname);
+}
+
 static void reftable_ref_record_print_void(const void *rec,
 					   int hash_size)
 {
@@ -455,6 +461,7 @@ static struct reftable_record_vtable reftable_ref_record_vtable = {
 	.release = &reftable_ref_record_release_void,
 	.is_deletion = &reftable_ref_record_is_deletion_void,
 	.equal = &reftable_ref_record_equal_void,
+	.cmp = &reftable_ref_record_cmp_void,
 	.print = &reftable_ref_record_print_void,
 };
 
@@ -625,6 +632,25 @@ static int reftable_obj_record_equal_void(const void *a, const void *b, int hash
 	return 1;
 }
 
+static int reftable_obj_record_cmp_void(const void *_a, const void *_b)
+{
+	const struct reftable_obj_record *a = _a;
+	const struct reftable_obj_record *b = _b;
+	int cmp;
+
+	cmp = memcmp(a->hash_prefix, b->hash_prefix,
+		     a->hash_prefix_len > b->hash_prefix_len ?
+		     a->hash_prefix_len : b->hash_prefix_len);
+	if (cmp)
+		return cmp;
+
+	/*
+	 * When the prefix is the same then the object record that is longer is
+	 * considered to be bigger.
+	 */
+	return a->hash_prefix_len - b->hash_prefix_len;
+}
+
 static struct reftable_record_vtable reftable_obj_record_vtable = {
 	.key = &reftable_obj_record_key,
 	.type = BLOCK_TYPE_OBJ,
@@ -635,6 +661,7 @@ static struct reftable_record_vtable reftable_obj_record_vtable = {
 	.release = &reftable_obj_record_release,
 	.is_deletion = &not_a_deletion,
 	.equal = &reftable_obj_record_equal_void,
+	.cmp = &reftable_obj_record_cmp_void,
 	.print = &reftable_obj_record_print,
 };
 
@@ -953,6 +980,22 @@ static int reftable_log_record_equal_void(const void *a,
 					 hash_size);
 }
 
+static int reftable_log_record_cmp_void(const void *_a, const void *_b)
+{
+	const struct reftable_log_record *a = _a;
+	const struct reftable_log_record *b = _b;
+	int cmp = strcmp(a->refname, b->refname);
+	if (cmp)
+		return cmp;
+
+	/*
+	 * Note that the comparison here is reversed. This is because the
+	 * update index is reversed when comparing keys. For reference, see how
+	 * we handle this in reftable_log_record_key()`.
+	 */
+	return b->update_index - a->update_index;
+}
+
 int reftable_log_record_equal(const struct reftable_log_record *a,
 			      const struct reftable_log_record *b, int hash_size)
 {
@@ -1002,6 +1045,7 @@ static struct reftable_record_vtable reftable_log_record_vtable = {
 	.release = &reftable_log_record_release_void,
 	.is_deletion = &reftable_log_record_is_deletion_void,
 	.equal = &reftable_log_record_equal_void,
+	.cmp = &reftable_log_record_cmp_void,
 	.print = &reftable_log_record_print_void,
 };
 
@@ -1077,6 +1121,13 @@ static int reftable_index_record_equal(const void *a, const void *b, int hash_si
 	return ia->offset == ib->offset && !strbuf_cmp(&ia->last_key, &ib->last_key);
 }
 
+static int reftable_index_record_cmp(const void *_a, const void *_b)
+{
+	const struct reftable_index_record *a = _a;
+	const struct reftable_index_record *b = _b;
+	return strbuf_cmp(&a->last_key, &b->last_key);
+}
+
 static void reftable_index_record_print(const void *rec, int hash_size)
 {
 	const struct reftable_index_record *idx = rec;
@@ -1094,6 +1145,7 @@ static struct reftable_record_vtable reftable_index_record_vtable = {
 	.release = &reftable_index_record_release,
 	.is_deletion = &not_a_deletion,
 	.equal = &reftable_index_record_equal,
+	.cmp = &reftable_index_record_cmp,
 	.print = &reftable_index_record_print,
 };
 
@@ -1147,6 +1199,14 @@ int reftable_record_is_deletion(struct reftable_record *rec)
 		reftable_record_data(rec));
 }
 
+int reftable_record_cmp(struct reftable_record *a, struct reftable_record *b)
+{
+	if (a->type != b->type)
+		BUG("cannot compare reftable records of different type");
+	return reftable_record_vtable(a)->cmp(
+		reftable_record_data(a), reftable_record_data(b));
+}
+
 int reftable_record_equal(struct reftable_record *a, struct reftable_record *b, int hash_size)
 {
 	if (a->type != b->type)
diff --git a/reftable/record.h b/reftable/record.h
index fd80cd451d..0d96fbfd1b 100644
--- a/reftable/record.h
+++ b/reftable/record.h
@@ -62,6 +62,12 @@ struct reftable_record_vtable {
 	/* Are two records equal? This assumes they have the same type. Returns 0 for non-equal. */
 	int (*equal)(const void *a, const void *b, int hash_size);
 
+	/*
+	 * Compare keys of two records with each other. The records must have
+	 * the same type.
+	 */
+	int (*cmp)(const void *a, const void *b);
+
 	/* Print on stdout, for debugging. */
 	void (*print)(const void *rec, int hash_size);
 };
@@ -114,6 +120,7 @@ struct reftable_record {
 };
 
 /* see struct record_vtable */
+int reftable_record_cmp(struct reftable_record *a, struct reftable_record *b);
 int reftable_record_equal(struct reftable_record *a, struct reftable_record *b, int hash_size);
 void reftable_record_print(struct reftable_record *rec, int hash_size);
 void reftable_record_key(struct reftable_record *rec, struct strbuf *dest);
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 2/7] reftable/merged: allocation-less dropping of shadowed records
From: Patrick Steinhardt @ 2024-02-12  8:32 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, John Cai
In-Reply-To: <cover.1707726654.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4576 bytes --]

The purpose of the merged reftable iterator is to iterate through all
entries of a set of tables in the correct order. This is implemented by
using a sub-iterator for each table, where the next entry of each of
these iterators gets put into a priority queue. For each iteration, we
do roughly the following steps:

  1. Retrieve the top record of the priority queue. This is the entry we
     want to return to the caller.

  2. Retrieve the next record of the sub-iterator that this record came
     from. If any, add it to the priority queue at the correct position.
     The position is determined by comparing the record keys, which e.g.
     corresponds to the refname for ref records.

  3. Keep removing the top record of the priority queue until we hit the
     first entry whose key is larger than the returned record's key.
     This is required to drop "shadowed" records.

The last step will lead to at least one comparison to the next entry,
but may lead to many comparisons in case the reftable stack consists of
many tables with shadowed records. It is thus part of the hot code path
when iterating through records.

The code to compare the entries with each other is quite inefficient
though. Instead of comparing record keys with each other directly, we
first format them into `struct strbuf`s and only then compare them with
each other. While we already optimized this code path to reuse buffers
in 829231dc20 (reftable/merged: reuse buffer to compute record keys,
2023-12-11), the cost to format the keys into the buffers still adds up
quite significantly.

Refactor the code to use `reftable_record_cmp()` instead, which has been
introduced in the preceding commit. This function compares records with
each other directly without requiring any memory allocations or copying
and is thus way more efficient.

The following benchmark uses git-show-ref(1) to print a single ref
matching a pattern out of 1 million refs. This is the most direct way to
exercise ref iteration speed as we remove all overhead of having to show
the refs, too.

    Benchmark 1: show-ref: single matching ref (revision = HEAD~)
      Time (mean ± σ):     180.7 ms ±   4.7 ms    [User: 177.1 ms, System: 3.4 ms]
      Range (min … max):   174.9 ms … 211.7 ms    1000 runs

    Benchmark 2: show-ref: single matching ref (revision = HEAD)
      Time (mean ± σ):     162.1 ms ±   4.4 ms    [User: 158.5 ms, System: 3.4 ms]
      Range (min … max):   155.4 ms … 189.3 ms    1000 runs

    Summary
      show-ref: single matching ref (revision = HEAD) ran
        1.11 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 11 ++---------
 reftable/merged.h |  2 --
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index c258ce953e..fb9978d798 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -51,8 +51,6 @@ static void merged_iter_close(void *p)
 		reftable_iterator_destroy(&mi->stack[i]);
 	}
 	reftable_free(mi->stack);
-	strbuf_release(&mi->key);
-	strbuf_release(&mi->entry_key);
 }

 static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
@@ -105,14 +103,11 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 	  such a deployment, the loop below must be changed to collect all
 	  entries for the same key, and return new the newest one.
 	*/
-	reftable_record_key(&entry.rec, &mi->entry_key);
 	while (!merged_iter_pqueue_is_empty(mi->pq)) {
 		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
-		int cmp = 0;
+		int cmp;

-		reftable_record_key(&top.rec, &mi->key);
-
-		cmp = strbuf_cmp(&mi->key, &mi->entry_key);
+		cmp = reftable_record_cmp(&top.rec, &entry.rec);
 		if (cmp > 0)
 			break;

@@ -246,8 +241,6 @@ static int merged_table_seek_record(struct reftable_merged_table *mt,
 		.typ = reftable_record_type(rec),
 		.hash_id = mt->hash_id,
 		.suppress_deletions = mt->suppress_deletions,
-		.key = STRBUF_INIT,
-		.entry_key = STRBUF_INIT,
 	};
 	int n = 0;
 	int err = 0;
diff --git a/reftable/merged.h b/reftable/merged.h
index d5b39dfe7f..7d9f95d27e 100644
--- a/reftable/merged.h
+++ b/reftable/merged.h
@@ -31,8 +31,6 @@ struct merged_iter {
 	uint8_t typ;
 	int suppress_deletions;
 	struct merged_iter_pqueue pq;
-	struct strbuf key;
-	struct strbuf entry_key;
 };

 void merged_table_release(struct reftable_merged_table *mt);
-- 
2.43.GIT

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 3/7] reftable/merged: skip comparison for records of the same subiter
From: Patrick Steinhardt @ 2024-02-12  8:32 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, John Cai
In-Reply-To: <cover.1707726654.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2550 bytes --]

When retrieving the next entry of a merged iterator we need to drop all
records of other sub-iterators that would be shadowed by the record that
we are about to return. We do this by comparing record keys, dropping
all keys that are smaller or equal to the key of the record we are about
to return.

There is an edge case here where we can skip that comparison: when the
record in the priority queue comes from the same subiterator as the
record we are about to return then we know that its key must be larger
than the key of the record we are about to return. This property is
guaranteed by the sub-iterators, and if it didn't hold then the whole
merged iterator would return records in the wrong order, too.

While this may seem like a very specific edge case it's in fact quite
likely to happen. For most repositories out there you can assume that we
will end up with one large table and several smaller ones on top of it.
Thus, it is very likely that the next entry will sort towards the top of
the priority queue.

Special case this and break out of the loop in that case. The following
benchmark uses git-show-ref(1) to print a single ref matching a pattern
out of 1 million refs:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     162.6 ms ±   4.5 ms    [User: 159.0 ms, System: 3.5 ms]
    Range (min … max):   156.6 ms … 188.5 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     156.8 ms ±   4.7 ms    [User: 153.0 ms, System: 3.6 ms]
    Range (min … max):   151.4 ms … 188.4 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.04 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/reftable/merged.c b/reftable/merged.c
index fb9978d798..0f74a14a77 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -107,6 +107,14 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
 		int cmp;

+		/*
+		 * When the next entry comes from the same queue as the current
+		 * entry then it must by definition be larger. This avoids a
+		 * comparison in the most common case.
+		 */
+		if (top.index == entry.index)
+			break;
+
 		cmp = reftable_record_cmp(&top.rec, &entry.rec);
 		if (cmp > 0)
 			break;
-- 
2.43.GIT

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 4/7] reftable/pq: allocation-less comparison of entry keys
From: Patrick Steinhardt @ 2024-02-12  8:32 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, John Cai
In-Reply-To: <cover.1707726654.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2853 bytes --]

The priority queue is used by the merged iterator to iterate over
reftable records from multiple tables in the correct order. The queue
ends up having one record for each table that is being iterated over,
with the record that is supposed to be shown next at the top. For
example, the key of a ref record is equal to its name so that we end up
sorting the priority queue lexicographically by ref name.

To figure out the order we need to compare the reftable record keys with
each other. This comparison is done by formatting them into a `struct
strbuf` and then doing `strbuf_strcmp()` on the result. We then discard
the buffers immediately after the comparison.

This ends up being very expensive. Because the priority queue usually
contains as many records as we have tables, we call the comparison
function `O(log($tablecount))` many times for every record we insert.
Furthermore, when iterating over many refs, we will insert at least one
record for every ref we are iterating over. So ultimately, this ends up
being called `O($refcount * log($tablecount))` many times.

Refactor the code to use the new `refatble_record_cmp()` function that
has been implemented in a preceding commit. This function does not need
to allocate memory and is thus significantly more efficient.

The following benchmark prints a single ref matching a specific pattern
out of 1 million refs via git-show-ref(1), where the reftable stack
consists of three tables:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     224.4 ms ±   6.5 ms    [User: 220.6 ms, System: 3.6 ms]
    Range (min … max):   216.5 ms … 261.1 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     172.9 ms ±   4.4 ms    [User: 169.2 ms, System: 3.6 ms]
    Range (min … max):   166.5 ms … 204.6 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.30 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/pq.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/reftable/pq.c b/reftable/pq.c
index dcefeb793a..7220efc39a 100644
--- a/reftable/pq.c
+++ b/reftable/pq.c
@@ -14,20 +14,9 @@ license that can be found in the LICENSE file or at

 int pq_less(struct pq_entry *a, struct pq_entry *b)
 {
-	struct strbuf ak = STRBUF_INIT;
-	struct strbuf bk = STRBUF_INIT;
-	int cmp = 0;
-	reftable_record_key(&a->rec, &ak);
-	reftable_record_key(&b->rec, &bk);
-
-	cmp = strbuf_cmp(&ak, &bk);
-
-	strbuf_release(&ak);
-	strbuf_release(&bk);
-
+	int cmp = reftable_record_cmp(&a->rec, &b->rec);
 	if (cmp == 0)
 		return a->index > b->index;
-
 	return cmp < 0;
 }

-- 
2.43.GIT

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 5/7] reftable/block: swap buffers instead of copying
From: Patrick Steinhardt @ 2024-02-12  8:32 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, John Cai
In-Reply-To: <cover.1707726654.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2068 bytes --]

When iterating towards the next record in a reftable block we need to
keep track of the key that the last record had. This is required because
reftable records use prefix compression, where subsequent records may
reuse parts of their preceding record's key.

This key is stored in the `block_iter::last_key`, which we update after
every call to `block_iter_next()`: we simply reset the buffer and then
add the current key to it.

This is a bit inefficient though because it requires us to copy over the
key on every iteration, which adds up when iterating over many records.
Instead, we can make use of the fact that the `block_iter::key` buffer
is basically only a scratch buffer. So instead of copying over contents,
we can just swap both buffers.

The following benchmark prints a single ref matching a specific pattern
out of 1 million refs via git-show-ref(1):

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     155.7 ms ±   5.0 ms    [User: 152.1 ms, System: 3.4 ms]
    Range (min … max):   150.8 ms … 185.7 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     150.8 ms ±   4.2 ms    [User: 147.1 ms, System: 3.5 ms]
    Range (min … max):   145.1 ms … 180.7 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.03 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 1df3d8a0f0..44381ea6a3 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -342,8 +342,7 @@ int block_iter_next(struct block_iter *it, struct reftable_record *rec)
 		return -1;
 	string_view_consume(&in, n);

-	strbuf_reset(&it->last_key);
-	strbuf_addbuf(&it->last_key, &it->key);
+	strbuf_swap(&it->last_key, &it->key);
 	it->next_off += start.len - in.len;
 	return 0;
 }
-- 
2.43.GIT

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 6/7] reftable/record: don't try to reallocate ref record name
From: Patrick Steinhardt @ 2024-02-12  8:32 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, John Cai
In-Reply-To: <cover.1707726654.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2046 bytes --]

When decoding reftable ref records we first release the pointer to the
record passed to us and then use realloc(3P) to allocate the refname
array. This is a bit misleading though as we know at that point that the
refname will always be `NULL`, so we would always end up allocating a
new char array anyway.

Refactor the code to use `REFTABLE_ALLOC_ARRAY()` instead. As the
following benchmark demonstrates this is a tiny bit more efficient. But
the bigger selling point really is the gained clarity.

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     150.1 ms ±   4.1 ms    [User: 146.6 ms, System: 3.3 ms]
    Range (min … max):   144.5 ms … 180.5 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     148.9 ms ±   4.5 ms    [User: 145.2 ms, System: 3.4 ms]
    Range (min … max):   143.0 ms … 185.4 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.01 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Ideally, we should try and reuse the memory of the old record instead of
first freeing and then immediately reallocating it. This requires some
more surgery though and is thus left for a future iteration.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/record.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/reftable/record.c b/reftable/record.c
index f1b6a5eac9..6465a7b8f4 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -377,10 +377,11 @@ static int reftable_ref_record_decode(void *rec, struct strbuf key,
 
 	assert(hash_size > 0);
 
-	r->refname = reftable_realloc(r->refname, key.len + 1);
+	r->refname = reftable_malloc(key.len + 1);
 	memcpy(r->refname, key.buf, key.len);
-	r->update_index = update_index;
 	r->refname[key.len] = 0;
+
+	r->update_index = update_index;
 	r->value_type = val_type;
 	switch (val_type) {
 	case REFTABLE_REF_VAL1:
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 7/7] reftable/reader: add comments to `table_iter_next()`
From: Patrick Steinhardt @ 2024-02-12  8:32 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, John Cai
In-Reply-To: <cover.1707726654.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2383 bytes --]

While working on the optimizations in the preceding patches I stumbled
upon `table_iter_next()` multiple times. It is quite easy to miss the
fact that we don't call `table_iter_next_in_block()` twice, but that the
second call is in fact `table_iter_next_block()`.

Add comments to explain what exactly is going on here to make things
more obvious. While at it, touch up the code to conform to our code
style better.

Note that one of the refactorings merges two conditional blocks into
one. Before, we had the following code:

```
err = table_iter_next_block(&next, ti
if (err != 0) {
	ti->is_finished = 1;
}
table_iter_block_done(ti);
if (err != 0) {
	return err;
}
```

As `table_iter_block_done()` does not care about `is_finished`, the
conditional blocks can be merged into one block:

```
err = table_iter_next_block(&next, ti
table_iter_block_done(ti);
if (err != 0) {
	ti->is_finished = 1;
	return err;
}
```

This is both easier to reason about and more performant because we have
one branch less.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/reader.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/reftable/reader.c b/reftable/reader.c
index 64dc366fb1..add7d57f0b 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -357,24 +357,32 @@ static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
 
 	while (1) {
 		struct table_iter next = TABLE_ITER_INIT;
-		int err = 0;
-		if (ti->is_finished) {
+		int err;
+
+		if (ti->is_finished)
 			return 1;
-		}
 
+		/*
+		 * Check whether the current block still has more records. If
+		 * so, return it. If the iterator returns positive then the
+		 * current block has been exhausted.
+		 */
 		err = table_iter_next_in_block(ti, rec);
-		if (err <= 0) {
+		if (err <= 0)
 			return err;
-		}
 
+		/*
+		 * Otherwise, we need to continue to the next block in the
+		 * table and retry. If there are no more blocks then the
+		 * iterator is drained.
+		 */
 		err = table_iter_next_block(&next, ti);
-		if (err != 0) {
-			ti->is_finished = 1;
-		}
 		table_iter_block_done(ti);
-		if (err != 0) {
+		if (err) {
+			ti->is_finished = 1;
 			return err;
 		}
+
 		table_iter_copy_from(ti, &next);
 		block_iter_close(&next.bi);
 	}
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* Re: [PATCH v2] unit-tests: do show relative file paths on non-Windows, too
From: Phillip Wood @ 2024-02-12 10:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin, Randall S. Becker
In-Reply-To: <xmqqle7r9enn.fsf_-_@gitster.g>

Hi Junio

On 11/02/2024 15:58, Junio C Hamano wrote:
>>
>> We know which separator we're expecting so we could replace  the last
>> two comparisons with
>>
>> 		prefix[prefix_len -1] != needle[1]
>>
>> but as I say I'm not sure that is worth re-rolling for
> 
> There is a larger clean-up opportunity to drop the need for making a
> copy, which probably is worth doing, so I folded the above into this
> version.

Ooh, that's nice. This version looks good, I found the code comments 
very helpful

Best Wishes

Phillip

> ------- >8 ------------- >8 ------------- >8 ------------- >8 -------
> 
> There are compilers other than Visual C that want to show absolute
> paths.  Generalize the helper introduced by a2c5e294 (unit-tests: do
> show relative file paths, 2023-09-25) so that it can also work with
> a path that uses slash as the directory separator, and becomes
> almost no-op once one-time preparation finds out that we are using a
> compiler that already gives relative paths.  Incidentally, this also
> should do the right thing on Windows with a compiler that shows
> relative paths but with backslash as the directory separator (if
> such a thing exists and is used to build git).
> 
> Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
> Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> 
>    * I found that the diff relative to the result of applying v1 was
>      easier to follow than the range-diff, so here it is.
> 
>    diff --git c/t/unit-tests/test-lib.c w/t/unit-tests/test-lib.c
>    index 83c9eb8c59..66d6980ffb 100644
>    --- c/t/unit-tests/test-lib.c
>    +++ w/t/unit-tests/test-lib.c
>    @@ -64,34 +64,33 @@ static const char *make_relative(const char *location)
>     		 * prefix_len == 0 if the compiler gives paths relative
>     		 * to the root of the working tree.  Otherwise, we want
>     		 * to see that we did find the needle[] at a directory
>    -		 * boundary.
>    +		 * boundary.  Again we rely on that needle[] begins with
>    +		 * "t" followed by the directory separator.
>     		 */
>     		if (fspathcmp(needle, prefix + prefix_len) ||
>    -		    (prefix_len &&
>    -		     prefix[prefix_len - 1] != '/' &&
>    -		     prefix[prefix_len - 1] != '\\'))
>    +		    (prefix_len && prefix[prefix_len - 1] != needle[1]))
>     			die("unexpected suffix of '%s'", prefix);
>    -
>     	}
>     
>     	/*
>    -	 * If our compiler gives relative paths and we do not need
>    -	 * to munge directory separator, we can return location as-is.
>    +	 * Does it not start with the expected prefix?
>    +	 * Return it as-is without making it worse.
>     	 */
>    -	if (!prefix_len && !need_bs_to_fs)
>    +	if (prefix_len && fspathncmp(location, prefix, prefix_len))
>     		return location;
>     
>    -	/* Does it not start with the expected prefix? */
>    -	if (fspathncmp(location, prefix, prefix_len))
>    -		return location;
>    +	/*
>    +	 * If we do not need to munge directory separator, we can return
>    +	 * the substring at the tail of the location.
>    +	 */
>    +	if (!need_bs_to_fs)
>    +		return location + prefix_len;
>     
>    -	strlcpy(buf, location + prefix_len, sizeof(buf));
>     	/* convert backslashes to forward slashes */
>    -	if (need_bs_to_fs) {
>    -		for (p = buf; *p; p++)
>    -			if (*p == '\\')
>    -				*p = '/';
>    -	}
>    +	strlcpy(buf, location + prefix_len, sizeof(buf));
>    +	for (p = buf; *p; p++)
>    +		if (*p == '\\')
>    +			*p = '/';
>     	return buf;
>     }
>     
> 
>   t/unit-tests/test-lib.c | 61 +++++++++++++++++++++++++++++++----------
>   1 file changed, 47 insertions(+), 14 deletions(-)
> 
> diff --git a/t/unit-tests/test-lib.c b/t/unit-tests/test-lib.c
> index 7bf9dfdb95..66d6980ffb 100644
> --- a/t/unit-tests/test-lib.c
> +++ b/t/unit-tests/test-lib.c
> @@ -21,12 +21,11 @@ static struct {
>   	.result = RESULT_NONE,
>   };
>   
> -#ifndef _MSC_VER
> -#define make_relative(location) location
> -#else
>   /*
>    * Visual C interpolates the absolute Windows path for `__FILE__`,
>    * but we want to see relative paths, as verified by t0080.
> + * There are other compilers that do the same, and are not for
> + * Windows.
>    */
>   #include "dir.h"
>   
> @@ -34,32 +33,66 @@ static const char *make_relative(const char *location)
>   {
>   	static char prefix[] = __FILE__, buf[PATH_MAX], *p;
>   	static size_t prefix_len;
> +	static int need_bs_to_fs = -1;
>   
> -	if (!prefix_len) {
> +	/* one-time preparation */
> +	if (need_bs_to_fs < 0) {
>   		size_t len = strlen(prefix);
> -		const char *needle = "\\t\\unit-tests\\test-lib.c";
> +		char needle[] = "t\\unit-tests\\test-lib.c";
>   		size_t needle_len = strlen(needle);
>   
> -		if (len < needle_len || strcmp(needle, prefix + len - needle_len))
> -			die("unexpected suffix of '%s'", prefix);
> +		if (len < needle_len)
> +			die("unexpected prefix '%s'", prefix);
> +
> +		/*
> +		 * The path could be relative (t/unit-tests/test-lib.c)
> +		 * or full (/home/user/git/t/unit-tests/test-lib.c).
> +		 * Check the slash between "t" and "unit-tests".
> +		 */
> +		prefix_len = len - needle_len;
> +		if (prefix[prefix_len + 1] == '/') {
> +			/* Oh, we're not Windows */
> +			for (size_t i = 0; i < needle_len; i++)
> +				if (needle[i] == '\\')
> +					needle[i] = '/';
> +			need_bs_to_fs = 0;
> +		} else {
> +			need_bs_to_fs = 1;
> +		}
>   
> -		/* let it end in a directory separator */
> -		prefix_len = len - needle_len + 1;
> +		/*
> +		 * prefix_len == 0 if the compiler gives paths relative
> +		 * to the root of the working tree.  Otherwise, we want
> +		 * to see that we did find the needle[] at a directory
> +		 * boundary.  Again we rely on that needle[] begins with
> +		 * "t" followed by the directory separator.
> +		 */
> +		if (fspathcmp(needle, prefix + prefix_len) ||
> +		    (prefix_len && prefix[prefix_len - 1] != needle[1]))
> +			die("unexpected suffix of '%s'", prefix);
>   	}
>   
> -	/* Does it not start with the expected prefix? */
> -	if (fspathncmp(location, prefix, prefix_len))
> +	/*
> +	 * Does it not start with the expected prefix?
> +	 * Return it as-is without making it worse.
> +	 */
> +	if (prefix_len && fspathncmp(location, prefix, prefix_len))
>   		return location;
>   
> -	strlcpy(buf, location + prefix_len, sizeof(buf));
> +	/*
> +	 * If we do not need to munge directory separator, we can return
> +	 * the substring at the tail of the location.
> +	 */
> +	if (!need_bs_to_fs)
> +		return location + prefix_len;
> +
>   	/* convert backslashes to forward slashes */
> +	strlcpy(buf, location + prefix_len, sizeof(buf));
>   	for (p = buf; *p; p++)
>   		if (*p == '\\')
>   			*p = '/';
> -
>   	return buf;
>   }
> -#endif
>   
>   static void msg_with_prefix(const char *prefix, const char *format, va_list ap)
>   {


^ permalink raw reply

* Re: [PATCH v4 2/2] revision: implement `git log --merge` also for rebase/cherry-pick/revert
From: Phillip Wood @ 2024-02-12 11:02 UTC (permalink / raw)
  To: Philippe Blain, git
  Cc: Johannes Sixt, Elijah Newren, Michael Lohmann, Phillip Wood,
	Patrick Steinhardt, Michael Lohmann, Junio C Hamano
In-Reply-To: <20240210-ml-log-merge-with-cherry-pick-and-other-pseudo-heads-v4-2-3bc9e62808f4@gmail.com>

Hi Philippe

On 10/02/2024 23:35, Philippe Blain wrote:
> From: Michael Lohmann <mi.al.lohmann@gmail.com>
> 
> 'git log' learned in ae3e5e1ef2 (git log -p --merge [[--] paths...],
> 2006-07-03) to show commits touching conflicted files in the range
> HEAD...MERGE_HEAD, an addition documented in d249b45547 (Document
> rev-list's option --merge, 2006-08-04).
> 
> It can be useful to look at the commit history to understand what lead
> to merge conflicts also for other mergy operations besides merges, like
> cherry-pick, revert and rebase.
> 
> For rebases, an interesting range to look at is HEAD...REBASE_HEAD,
> since the conflicts are usually caused by how the code changed
> differently on HEAD since REBASE_HEAD forked from it.
> 
> For cherry-picks and revert, it is less clear that
> HEAD...CHERRY_PICK_HEAD and HEAD...REVERT_HEAD are indeed interesting
> ranges, since these commands are about applying or unapplying a single
> (or a few, for cherry-pick) commit(s) on top of HEAD. However, conflicts
> encountered during these operations can indeed be caused by changes
> introduced in preceding commits on both sides of the history.

I tend to think that there isn't much difference between rebase and 
cherry-pick here - they are both cherry-picking commits and it is 
perfectly possible to rebase a branch onto an unrelated upstream. The 
important part for me is that we're showing these commits because even 
though they aren't part of the 3-way merge they are relevant for 
investigating where any merge conflicts come from.

For revert I'd argue that the only sane use is reverting an ancestor of 
HEAD but maybe I'm missing something. In that case REVERT_HEAD...HEAD is 
the same as REVERT_HEAD..HEAD so it shows the changes since the commit 
that is being reverted which will be the ones causing the conflict.

> Adjust the code in prepare_show_merge so it constructs the range
> HEAD...$OTHER for each of OTHER={MERGE_HEAD, CHERRY_PICK_HEAD,
> REVERT_HEAD or REBASE_HEAD}. Note that we try these pseudorefs in order,
> so keep REBASE_HEAD last since the three other operations can be
> performed during a rebase. Note also that in the uncommon case where
> $OTHER and HEAD do not share a common ancestor, this will show the
> complete histories of both sides since their root commits, which is the
> same behaviour as currently happens in that case for HEAD and
> MERGE_HEAD.
> 
> Adjust the documentation of this option accordingly.

Thanks for the comprehensive commit message.

> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index 2bf239ff03..5b4672c346 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -341,8 +341,10 @@ See also linkgit:git-reflog[1].
>   Under `--pretty=reference`, this information will not be shown at all.
>   
>   --merge::
> -	After a failed merge, show refs that touch files having a
> -	conflict and don't exist on all heads to merge.
> +	Show commits touching conflicted paths in the range `HEAD...$OTHER`,
> +	where `$OTHER` is the first existing pseudoref in `MERGE_HEAD`,
> +	`CHERRY_PICK_HEAD`, `REVERT_HEAD` or `REBASE_HEAD`. Only works
> +	when the index has unmerged entries.

Do you know what "and don't exist on all heads to merge" in the original 
is referring to? The new text doesn't mention anything that sounds like 
that but I don't understand what the original was trying to say.

It might be worth adding a sentence explaining when this option is useful.

     This option can be used to show the commits that are relevant
     when resolving conflicts from a 3-way merge

or something like that.

>   --boundary::
>   	Output excluded boundary commits. Boundary commits are
> diff --git a/revision.c b/revision.c
> index aa4c4dc778..36dc2f94f7 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -1961,11 +1961,31 @@ static void add_pending_commit_list(struct rev_info *revs,
>   	}
>   }
>   
> +static const char *lookup_other_head(struct object_id *oid)
> +{
> +	int i;
> +	static const char *const other_head[] = {
> +		"MERGE_HEAD", "CHERRY_PICK_HEAD", "REVERT_HEAD", "REBASE_HEAD"
> +	};
> +
> +	for (i = 0; i < ARRAY_SIZE(other_head); i++)
> +		if (!read_ref_full(other_head[i],
> +				RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
> +				oid, NULL)) {
> +			if (is_null_oid(oid))
> +				die("%s is a symbolic ref???", other_head[i]);

This would benefit from being translated and I think one '?' would 
suffice (I'm not sure we even need that - are there other possible 
causes of a null oid here?)

> +			return other_head[i];
> +		}
> +
> +	die("--merge without MERGE_HEAD, CHERRY_PICK_HEAD, REVERT_HEAD or REBASE_HEAD?");

This is not a question and would also benefit from translation. It might 
be more helpful to say that "--merge" requires one of those pseudorefs.

Thanks for pick this series up and polishing it

Phillip


^ permalink raw reply

* Re: [PATCH v4 1/5] refs: introduce `is_pseudoref()` and `is_headref()`
From: Patrick Steinhardt @ 2024-02-12 12:47 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, gitster, phillip.wood123, Jeff King
In-Reply-To: <20240211183923.131278-2-karthik.188@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5289 bytes --]

On Sun, Feb 11, 2024 at 07:39:19PM +0100, Karthik Nayak wrote:
> Introduce two new functions `is_pseudoref()` and `is_headref()`. This
> provides the necessary functionality for us to add pseudorefs and HEAD
> to the loose ref cache in the files backend, allowing us to build
> tooling to print these refs.
> 
> The `is_pseudoref()` function internally calls `is_pseudoref_syntax()`
> but adds onto it by also checking to ensure that the pseudoref either
> ends with a "_HEAD" suffix or matches a list of exceptions. After which
> we also parse the contents of the pseudoref to ensure that it conforms
> to the ref format.
> 
> We cannot directly add the new syntax checks to `is_pseudoref_syntax()`
> because the function is also used by `is_current_worktree_ref()` and
> making it stricter to match only known pseudorefs might have unintended
> consequences due to files like 'BISECT_START' which isn't a pseudoref
> but sometimes contains object ID.
> 
> Keeping this in mind, we leave `is_pseudoref_syntax()` as is and create
> `is_pseudoref()` which is stricter. Ideally we'd want to move the new
> syntax checks to `is_pseudoref_syntax()` but a prerequisite for this
> would be to actually remove the exception list by converting those
> pseudorefs to also contain a '_HEAD' suffix and perhaps move bisect
> related files like 'BISECT_START' to a new directory similar to the
> 'rebase-merge' directory.
> 
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  refs.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  refs.h |  3 +++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/refs.c b/refs.c
> index fff343c256..d8e4cf9a11 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -860,6 +860,47 @@ static int is_pseudoref_syntax(const char *refname)
>  	return 1;
>  }
>  
> +int is_pseudoref(struct ref_store *refs, const char *refname)
> +{
> +	static const char *const irregular_pseudorefs[] = {
> +		"AUTO_MERGE",
> +		"BISECT_EXPECTED_REV",
> +		"NOTES_MERGE_PARTIAL",
> +		"NOTES_MERGE_REF",
> +		"MERGE_AUTOSTASH",
> +	};
> +	struct object_id oid;
> +	size_t i;
> +
> +	if (!is_pseudoref_syntax(refname))
> +		return 0;
> +
> +	if (ends_with(refname, "_HEAD")) {
> +		refs_resolve_ref_unsafe(refs, refname,
> +   					RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
> +   					&oid, NULL);
> +   		return !is_null_oid(&oid);
> +	}

I think it's quite confusing that `is_pseudoref()` not only checks
whether the refname may be a pseudoref, but also whether it actually
exists. Furthermore, why is a pseudoref only considered to exist in case
it's not a symbolic ref? That sounds overly restrictive to me.

So I think this at least needs to be renamed. But I find it really hard
to come up with a proper name here because in my opinion the function
does too much. `is_existing_pseudoref()` feels much too specific to me.
Also, the "reftable" backend wouldn't need to check whether the ref
exists, but only whether a name that it encounters is a pseudoref name
or not.

> +	for (i = 0; i < ARRAY_SIZE(irregular_pseudorefs); i++)
> +		if (!strcmp(refname, irregular_pseudorefs[i])) {
> +			refs_resolve_ref_unsafe(refs, refname,
> +   						RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
> +   						&oid, NULL);
> +			return !is_null_oid(&oid);
> +		}
> +
> +	return 0;
> +}
> +
> +int is_headref(struct ref_store *refs, const char *refname)
> +{
> +	if (!strcmp(refname, "HEAD"))
> +		return refs_ref_exists(refs, refname);
> +
> +	return 0;
> +}

The same comment applies here, as well.

I also worry a bit about the API we have. It becomes really hard to
figure out which function to call now as the API surface seems to
explode. We have:

  - is_pseudoref_syntax
  - is_pseudoref
  - is_headref
  - check_refname_format
  - refname_is_safe

I wonder whether we can maybe consolidate the interface into one or
maybe even two functions where the behaviour can be tweaked with a flag
field. Something like `refname_is_valid()` with a bunch of flags:

  - REFNAME_ACCEPT_HEAD to accept "HEAD"
  - REFNAME_ACCEPT_PSEUDOREF to accept all of the refs ending with
    "_HEAD" or being one of the irregular pseudorefs.
  - REFNAME_ACCEPT_INVALID_BUT_SAFE to accept refnames which aren't
    valid, but which would pass `refname_is_safe()`.

Another alternative could be something like `classify_refname()` that
accepts a refname and returns an enum saying what kind of ref something
is.

Given that this topic won't be included in Git v2.44 anymore, I think
that opening this can of worms would be sensible now.

Patrick

>  static int is_current_worktree_ref(const char *ref) {
>  	return is_pseudoref_syntax(ref) || is_per_worktree_ref(ref);
>  }
> diff --git a/refs.h b/refs.h
> index 303c5fac4d..f66cdd731c 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -1023,4 +1023,7 @@ extern struct ref_namespace_info ref_namespace[NAMESPACE__COUNT];
>   */
>  void update_ref_namespace(enum ref_namespace namespace, char *ref);
>  
> +int is_pseudoref(struct ref_store *refs, const char *refname);
> +int is_headref(struct ref_store *refs, const char *refname);
> +
>  #endif /* REFS_H */
> -- 
> 2.43.GIT
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: Interactive rebase: using "pick" for merge commits
From: Phillip Wood @ 2024-02-12 14:38 UTC (permalink / raw)
  To: Patrick Steinhardt, Stefan Haller; +Cc: phillip.wood, git
In-Reply-To: <ZcnFl8kypKRYeLo3@tanuki>

Hi Patrick and Stefan

On 12/02/2024 07:15, Patrick Steinhardt wrote:
> On Sat, Feb 10, 2024 at 10:23:16AM +0100, Stefan Haller wrote:
>> On 09.02.24 17:24, Phillip Wood wrote:
>> Yes, I'm familiar with all this, but that's not what I mean. I don't
>> want to maintain the topology here, and I'm also not suggesting that git
>> itself generates such "pick" entries with -mX arguments (maybe I wasn't
>> clear on that). What I want to do is to add such entries myself, as a
>> user, resulting in the equivalent of doing a "break" at that point in
>> the rebase and doing a "git cherry-pick -mX <hash-of-merge-commit>"
>> manually.
> 
> It would be neat indeed if this could be specified in the instruction
> sheet. We already support options for the "merge" instruction, so
> extending "pick" to support options isn't that far-fetched. Then it
> would become possible to say "pick -m1 fa1afe1".

It would certainly be possible to extend the sequencer to do that but 
I'm not familiar with why people use "git cherry-pick -m" [1] so I'm 
wondering what this would be used for. It would involve a bit of extra 
complexity so I think we'd want a compelling reason as to why 
cherry-picking merges without maintaining the topology is useful 
especially as one can currently do that via "exec git cherry-pick -m ..."

Best Wishes

Phillip

[1] I did a quick web search and the results all seemed to focus on how 
to do it rather than why you'd want to.

^ permalink raw reply

* Bug report: Incorrect GIT_FLUSH behavior in 2.43.1 (regression and breaking)
From: Xiaoguang WANG @ 2024-02-12 15:18 UTC (permalink / raw)
  To: git

Ref: https://github.com/git/git/commit/556e68032f8248c831e48207e5cb923c9fe0e42c

If GIT_FLUSH=true, it should mean to "do the flush". But that commit
made skip_stdout_flush=true when GIT_FLUSH=true.

And by the way, only accepting GIT_FLUSH=true is quite breaking, it
drops the compatibility of GIT_FLUSH=1: it causes the existing
programs which depend on the "flushing(GIT_FLUSH=1)" behavior would
hang forever if they use the new git binary, because the program won't
see any flushed output ....

^ permalink raw reply

* RE: [BUG] git 2.44.0-rc0 t0080.1 Breaks on NonStop x86 and ia64
From: rsbecker @ 2024-02-12 16:05 UTC (permalink / raw)
  To: rsbecker, 'Junio C Hamano'; +Cc: git
In-Reply-To: <002a01da5c94$a1bc5340$e534f9c0$@nexbridge.com>

On Saturday, February 10, 2024 9:47 PM, Junio C Hamano wrote:
>On Saturday, February 10, 2024 9:06 PM, Junio C Hamano wrote:
>><rsbecker@nexbridge.com> writes:
>>
>>> The diff appears to have failed because of an assumption of how paths
>>> are resolved during compilation. The assumption is that files remain
>>> partially qualified, which is not the case in all C compilers. This
>>> is c99. My experience with gcc is that it qualifies names differently
>>> than other compilers.
>>
>>I just found this bit of gem in t/unit-tests/test-lib.c.  I guess
>>Visual C
>falls into the
>>same category as yours, while GCC and Clang are from different camps?
>>
>>The easiest way forward may be to build on top of it, perhaps like so,
>>to
>generalize
>>the make_relative() to cover your case, too?
>>
>> t/unit-tests/test-lib.c | 63
>+++++++++++++++++++++++++++++++++++++---------
>>---
>> 1 file changed, 48 insertions(+), 15 deletions(-)
>>
>>diff --git c/t/unit-tests/test-lib.c w/t/unit-tests/test-lib.c index
>>7bf9dfdb95..69dbe1bbc7 100644
>>--- c/t/unit-tests/test-lib.c
>>+++ w/t/unit-tests/test-lib.c
>>@@ -21,45 +21,78 @@ static struct {
>> 	.result = RESULT_NONE,
>> };
>>
>>-#ifndef _MSC_VER
>>-#define make_relative(location) location -#else
>>+#include "dir.h"
>> /*
>>  * Visual C interpolates the absolute Windows path for `__FILE__`,
>>  * but we want to see relative paths, as verified by t0080.
>>+ * There are other compilers that do the same, and are not for
>>+ * Windows.
>>  */
>>-#include "dir.h"
>>-
>> static const char *make_relative(const char *location)  {
>> 	static char prefix[] = __FILE__, buf[PATH_MAX], *p;
>> 	static size_t prefix_len;
>>+	static int need_bs_to_fs = -1;
>>
>>-	if (!prefix_len) {
>>+	/* one-time preparation */
>>+	if (need_bs_to_fs < 0) {
>> 		size_t len = strlen(prefix);
>>-		const char *needle = "\\t\\unit-tests\\test-lib.c";
>>+		char needle[] = "t\\unit-tests\\test-lib.c";
>> 		size_t needle_len = strlen(needle);
>>
>>-		if (len < needle_len || strcmp(needle, prefix + len -
>needle_len))
>>+		if (len < needle_len)
>>+			die("unexpected prefix '%s'", prefix);
>>+
>>+		/*
>>+		 * The path could be relative (t/unit-tests/test-lib.c)
>>+		 * or full (/home/user/git/t/unit-tests/test-lib.c).
>>+		 * Check the slash between "t" and "unit-tests".
>>+		 */
>>+		prefix_len = len - needle_len;
>>+		if (prefix[prefix_len + 1] == '/') {
>>+			/* Oh, we're not Windows */
>>+			for (size_t i = 0; i < needle_len; i++)
>>+				if (needle[i] == '\\')
>>+					needle[i] = '/';
>>+			need_bs_to_fs = 0;
>>+		} else {
>>+			need_bs_to_fs = 1;
>>+		}
>>+
>>+		/*
>>+		 * prefix_len == 0 if the compiler gives paths relative
>>+		 * to the root of the working tree.  Otherwise, we want
>>+		 * to see that we did find the needle[] at a directory
>>+		 * boundary.
>>+		 */
>>+		if (fspathcmp(needle, prefix + prefix_len) ||
>>+		    (prefix_len &&
>>+		     prefix[prefix_len - 1] != '/' &&
>>+		     prefix[prefix_len - 1] != '\\'))
>> 			die("unexpected suffix of '%s'", prefix);
>>
>>-		/* let it end in a directory separator */
>>-		prefix_len = len - needle_len + 1;
>> 	}
>>
>>+	/*
>>+	 * If our compiler gives relative paths and we do not need
>>+	 * to munge directory separator, we can return location as-is.
>>+	 */
>>+	if (!prefix_len && !need_bs_to_fs)
>>+		return location;
>>+
>> 	/* Does it not start with the expected prefix? */
>> 	if (fspathncmp(location, prefix, prefix_len))
>> 		return location;
>>
>> 	strlcpy(buf, lnocation + prefix_len, sizeof(buf));
>> 	/* convert backslashes to forward slashes */
>>-	for (p = buf; *p; p++)
>>-		if (*p == '\\')
>>-			*p = '/';
>>-
>>+	if (need_bs_to_fs) {
>>+		for (p = buf; *p; p++)
>>+			if (*p == '\\')
>>+				*p = '/';
>>+	}
>> 	return buf;
>> }
>>-#endif
>>
>> static void msg_with_prefix(const char *prefix, const char *format,
>va_list ap)  {
>
>This looks like a good plan.

This might be trivial, but I cannot tell. The #ifndef should be changed as
follows:

diff --git a/t/unit-tests/test-lib.c b/t/unit-tests/test-lib.c
index 7bf9dfdb95..2d1f6b7648 100644
--- a/t/unit-tests/test-lib.c
+++ b/t/unit-tests/test-lib.c
@@ -21,7 +21,7 @@ static struct {
        .result = RESULT_NONE,
 };

-#ifndef _MSC_VER
+#if !defined(_MSC_VER) && !defined(__TANDEM)
 #define make_relative(location) location
 #else
 /*

However, if this goes beyond Windows and NonStop, which I suspect it does,
we could add a separate knob in config.mak.uname to pull this in. However,
this does not correct the problem. actual ends up with only:

ok 1 - passing test
ok 2 - passing test and assertion return 1

which fails the test_cmp at the end of the test.



^ permalink raw reply related

* Re: git column fails (or crashes) if padding is negative
From: Junio C Hamano @ 2024-02-12 16:37 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: Tiago Pascoal, git@vger.kernel.org
In-Reply-To: <3380df68-83fb-417b-a490-71614edc342f@app.fastmail.com>

"Kristoffer Haugsbakk" <code@khaugsbakk.name> writes:

> On Fri, Feb 9, 2024, at 18:57, Junio C Hamano wrote:
>>  builtin/column.c | 2 ++
>>  column.c         | 4 ++--
>>  2 files changed, 4 insertions(+), 2 deletions(-)
>>
>> […]
>> diff --git c/column.c w/column.c
>> index ff2f0abf39..9cc703832a 100644
>> --- c/column.c
>> +++ w/column.c
>> @@ -189,7 +189,7 @@ void print_columns(const struct string_list *list,
>> unsigned int colopts,
>>  	memset(&nopts, 0, sizeof(nopts));
>>  	nopts.indent = opts && opts->indent ? opts->indent : "";
>>  	nopts.nl = opts && opts->nl ? opts->nl : "\n";
>> -	nopts.padding = opts ? opts->padding : 1;
>> +	nopts.padding = (opts && 0 < opts->padding) ? opts->padding : 1;
>
> If these two are meant to check the same condition as in
> `builtin/column.c`, shouldn’t it be `0 <= opts->padding`?

Good eyes.  Otherwise we lose the ability to set the padding to 0.

^ permalink raw reply

* Re: Interactive rebase: using "pick" for merge commits
From: Junio C Hamano @ 2024-02-12 16:39 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Stefan Haller, phillip.wood, git
In-Reply-To: <ZcnFl8kypKRYeLo3@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

> On Sat, Feb 10, 2024 at 10:23:16AM +0100, Stefan Haller wrote:
>> On 09.02.24 17:24, Phillip Wood wrote:
>> > On 09/02/2024 15:52, Stefan Haller wrote:
>> >> When I do an interactive rebase, and manually enter a "pick" with the
>> >> commit hash of a merge commit, I get the following confusing error
>> >> message:
>> >>
>> >> error: commit fa1afe1 is a merge but no -m option was given.
>> >> 
>> >> Is it crazy to want pick to work like this? Should it be supported?
>> > 
>> > It causes problems trying to maintain the topology. In the past there
>> > was a "--preserve-merges" option that allowed one to "pick" merges but
>> > it broke if the user edited the todo list. The "--rebase-merges" option
>> > was introduced with the "label", "reset" and "merge" todo list
>> > instructions to allow the user to control the topology.
>> 
>> Yes, I'm familiar with all this, but that's not what I mean. I don't
>> want to maintain the topology here, and I'm also not suggesting that git
>> itself generates such "pick" entries with -mX arguments (maybe I wasn't
>> clear on that). What I want to do is to add such entries myself, as a
>> user, resulting in the equivalent of doing a "break" at that point in
>> the rebase and doing a "git cherry-pick -mX <hash-of-merge-commit>"
>> manually.
>
> It would be neat indeed if this could be specified in the instruction
> sheet. We already support options for the "merge" instruction, so
> extending "pick" to support options isn't that far-fetched. Then it
> would become possible to say "pick -m1 fa1afe1".

Would adding "x git cherry-pick -m 1 $that_one" work there?

^ permalink raw reply

* Re: What's cooking in git.git (Feb 2024, #04; Thu, 8)
From: Junio C Hamano @ 2024-02-12 16:39 UTC (permalink / raw)
  To: Philippe Blain; +Cc: git, mi.al.lohmann
In-Reply-To: <13f08ce5-f036-f769-1ba9-7d47b572af28@gmail.com>

Philippe Blain <levraiphilippeblain@gmail.com> writes:

> Hi Junio,
>
> Le 2024-02-09 à 12:24, Junio C Hamano a écrit :
>> * ml/log-merge-with-cherry-pick-and-other-pseudo-heads (2024-02-08) 2 commits
>>  - revision: implement `git log --merge` also for rebase/cherry_pick/revert
>>  - revision: ensure MERGE_HEAD is a ref in prepare_show_merge
>> 
>>  "git log --merge" learned to pay attention to CHERRY_PICK_HEAD and
>>  other kinds of *_HEAD pseudorefs.
>> 
>>  Will merge to 'next'?
>>  source: <20240117081405.14012-1-mi.al.lohmann@gmail.com>
>>  source: <dfb582cf-b1e4-414d-bfe1-0f93d910ec54@kdbg.org>
>
> I think this is a very nice addition, I've been meaning to do a similar
> patch for quite some time.
>
> I think the commit message of 2/2 should be improved, which was pointed out a few
> times in the thread. I'll try to send a v4 with a more useful message, summarizing 
> the discussion, so maybe hold off on merging to next.

Thanks.

^ permalink raw reply

* Re: [PATCH 1/2] receive-pack: use find_commit_header() in check_cert_push_options()
From: Junio C Hamano @ 2024-02-12 16:40 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git List, Chandra Pratap, Chandra Pratap, Jeff King,
	Kyle Lippincott, John Cai
In-Reply-To: <7b0a87f1-4693-480e-90d4-14675e9a5f01@web.de>

René Scharfe <l.s.r@web.de> writes:

> I just discovered 14570dc67d (wrapper: add function to compare strings
> with different NUL termination, 2020-05-25).  Perhaps squash this in to
> simplify?

Oooh, xstrncmpz() seems to target this exact use case.  Nice.


>
> ---
>  builtin/receive-pack.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index dbee508775..db65607485 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -717,9 +717,8 @@ static int check_cert_push_options(const struct string_list *push_options)
>  		buf = option + optionlen + 1;
>  		options_seen++;
>  		if (options_seen > push_options->nr
> -		    || strncmp(push_options->items[options_seen - 1].string,
> -			       option, optionlen)
> -		    || push_options->items[options_seen - 1].string[optionlen])
> +		    || xstrncmpz(push_options->items[options_seen - 1].string,
> +				 option, optionlen))
>  			return 0;
>  	}
>
> --
> 2.43.0

^ permalink raw reply

* Re: [BUG] git 2.44.0-rc0 t0080.1 Breaks on NonStop x86 and ia64
From: Junio C Hamano @ 2024-02-12 16:43 UTC (permalink / raw)
  To: rsbecker; +Cc: git
In-Reply-To: <00fa01da5dcd$5b060150$111203f0$@nexbridge.com>

<rsbecker@nexbridge.com> writes:

>>This looks like a good plan.
>
> This might be trivial, but I cannot tell. The #ifndef should be changed as
> follows:

https://lore.kernel.org/git/xmqqttmf9y46.fsf@gitster.g/


^ permalink raw reply

* Re: [PATCH] ci: bump remaining outdated Actions versions
From: Junio C Hamano @ 2024-02-12 16:46 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin
In-Reply-To: <xmqqplx29aze.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> Curious that, among all other uses of actions/upload-artifact@v3,
> only this one has been using @v1, which may deserve explanation.
> The proposed commit log message pretends that this never existed.

I see that [v2 2/2] addresses this.  Nice.

^ permalink raw reply

* Re: [PATCH v2] column: disallow negative padding
From: Kristoffer Haugsbakk @ 2024-02-12 16:50 UTC (permalink / raw)
  To: Rubén Justo, git; +Cc: Tiago Pascoal, Chris Torek, Junio C Hamano
In-Reply-To: <89d32a5f-b5ab-4773-bd9f-d33b4e348e15@gmail.com>

On Sun, Feb 11, 2024, at 23:47, Rubén Justo wrote:
> While we're here, I wonder if silently ignoring a negative value in
> .padding is the right thing to do.
>
> There are several callers of print_columns():
>
> builtin/branch.c:           print_columns(&output, colopts, NULL);
> builtin/clean.c:    print_columns(&list, colopts, &copts);
> builtin/clean.c:    print_columns(menu_list, local_colopts, &copts);
> builtin/column.c:    print_columns(&list, colopts, &copts);
> help.c:     print_columns(&list, colopts, &copts);
> wt-status.c:       print_columns(&output, s->colopts, &copts);
>
> I haven't checked it thoroughly but it seems we don't need to add the
> check we're adding to builtin/column.c, to any of the other callers.
> However, it is possible that these or other new callers may need it in
> the future.  If so, we should consider doing something like:
>
> diff --git a/column.c b/column.c
> index c723428bc7..4f870c725f 100644
> --- a/column.c
> +++ b/column.c
> @@ -186,6 +186,9 @@ void print_columns(const struct string_list *list,
> unsigned int colopts,
>                 return;
>         assert((colopts & COL_ENABLE_MASK) != COL_AUTO);
>
> +       if (opts && (0 <= opts->padding))
> +               BUG("padding must be non-negative");
> +

Sure, I could add a `BUG` for `0 > opts->padding` in v3.

^ permalink raw reply

* Re: [PATCH v2 2/2] ci(linux32): add a note about Actions that must not be updated
From: Junio C Hamano @ 2024-02-12 16:51 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget; +Cc: git, Johannes Schindelin
In-Reply-To: <9088cc60bda0f3f0d152ddc3c22a60effa272483.1707653489.git.gitgitgadget@gmail.com>

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> So let's add a big fat warning, mainly for my own benefit, to avoid
> running into the very same issue over and over again.

If we had this, it would also have helped me when I did e94dec0c
(GitHub Actions: update to checkout@v4, 2024-02-02), as I wouldn't
have had to dig for 6cf4d908 (ci(main): upgrade actions/checkout to
v3, 2022-12-05).

Thanks.

^ permalink raw reply

* Re: [PATCH v4 1/5] refs: introduce `is_pseudoref()` and `is_headref()`
From: Junio C Hamano @ 2024-02-12 17:01 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Karthik Nayak, git, phillip.wood123, Jeff King
In-Reply-To: <ZcoTbRxIaGmTd4fJ@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

> I think it's quite confusing that `is_pseudoref()` not only checks
> whether the refname may be a pseudoref, but also whether it actually
> exists. Furthermore, why is a pseudoref only considered to exist in case
> it's not a symbolic ref? That sounds overly restrictive to me.

I am torn on this, but in favor of the proposed naming, primarily
because is_pseudoref_syntax() was about "does this string look like
the fullref a pseudoref would have?", and the reason why we wanted
to have this new function was we wanted to ask "does this string
name a valid pseudoref?"

 Q: Is CHERRY_PICK_HEAD a pseudoref?
 A: It would have been if it existed, but I see only
    $GIT_DIR/CHERRY_PICK_HEAD that is a symbolic link, and it cannot
    be a pseudoref.

I can certainly see a broken out set of helper functions to check

 - Does this string make a good fullref for a pseudoref?
 - Does a pseudoref with his string as its fullref exist?

independently.  The first one would answer Yes and the second one
would answer No in such a context.

Thanks.


^ permalink raw reply

* Re: Bug report: Incorrect GIT_FLUSH behavior in 2.43.1 (regression and breaking)
From: Junio C Hamano @ 2024-02-12 17:11 UTC (permalink / raw)
  To: Xiaoguang WANG, Taylor Blau, Torsten Bögershausen,
	Chandra Pratap; +Cc: git
In-Reply-To: <CABn0oJvg3M_kBW-u=j3QhKnO=6QOzk-YFTgonYw_UvFS1NTX4g@mail.gmail.com>

Xiaoguang WANG <wxiaoguang@gmail.com> writes:

> If GIT_FLUSH=true, it should mean to "do the flush". But that commit
> made skip_stdout_flush=true when GIT_FLUSH=true.

Thanks for reporting.  I am surprised that this flipping of polarity
slipped through.

> And by the way, only accepting GIT_FLUSH=true is quite breaking, it
> drops the compatibility of GIT_FLUSH=1

I do not think so. If the polarity is corrected, git_env_bool()
would say "that's affirmative" when any one of the "1", "true",
"yes", "on", etc. is given.  If you have been passing "1", you
should get the "always flush" behaviour.


Perhaps something like this would fix it?


 write-or-die.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git c/write-or-die.c w/write-or-die.c
index 3942152865..3ecb9e2af5 100644
--- c/write-or-die.c
+++ w/write-or-die.c
@@ -22,8 +22,11 @@ void maybe_flush_or_die(FILE *f, const char *desc)
 
 	if (f == stdout) {
 		if (skip_stdout_flush < 0) {
-			skip_stdout_flush = git_env_bool("GIT_FLUSH", -1);
-			if (skip_stdout_flush < 0) {
+			int flush_setting = git_env_bool("GIT_FLUSH", -1);
+
+			if (0 <= flush_setting)
+				skip_stdout_flush = !flush_setting;
+			else {
 				struct stat st;
 				if (fstat(fileno(stdout), &st))
 					skip_stdout_flush = 0;

^ permalink raw reply related

* Re: Bug report: Incorrect GIT_FLUSH behavior in 2.43.1 (regression and breaking)
From: Xiaoguang WANG @ 2024-02-12 17:18 UTC (permalink / raw)
  To: git
In-Reply-To: <xmqq8r3p7glr.fsf@gitster.g>

On Tue, Feb 13, 2024 at 1:11 AM Junio C Hamano <gitster@pobox.com> wrote:

> > And by the way, only accepting GIT_FLUSH=true is quite breaking, it
> > drops the compatibility of GIT_FLUSH=1
>
> I do not think so. If the polarity is corrected, git_env_bool()
> would say "that's affirmative" when any one of the "1", "true",
> "yes", "on", etc. is given.  If you have been passing "1", you
> should get the "always flush" behaviour.

Oh yes, you are right. There is a git_parse_int call in
git_parse_maybe_bool, so if the "flipping" could be fixed, then
everything should be fine.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox