Git development
 help / color / mirror / Atom feed
* [PATCH 1/7] bulk-checkin: factor out `format_object_header_hash()`
From: Taylor Blau @ 2023-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano
In-Reply-To: <cover.1696629697.git.me@ttaylorr.com>

Before deflating a blob into a pack, the bulk-checkin mechanism prepares
the pack object header by calling `format_object_header()`, and writing
into a scratch buffer, the contents of which eventually makes its way
into the pack.

Future commits will add support for deflating multiple kinds of objects
into a pack, and will likewise need to perform a similar operation as
below.

This is a mostly straightforward extraction, with one notable exception.
Instead of hard-coding `the_hash_algo`, pass it in to the new function
as an argument. This isn't strictly necessary for our immediate purposes
here, but will prove useful in the future if/when the bulk-checkin
mechanism grows support for the hash transition plan.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bulk-checkin.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 223562b4e7..0aac3dfe31 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -247,6 +247,19 @@ static void prepare_to_stream(struct bulk_checkin_packfile *state,
 		die_errno("unable to write pack header");
 }
 
+static void format_object_header_hash(const struct git_hash_algo *algop,
+				      git_hash_ctx *ctx, enum object_type type,
+				      size_t size)
+{
+	unsigned char header[16384];
+	unsigned header_len = format_object_header((char *)header,
+						   sizeof(header),
+						   type, size);
+
+	algop->init_fn(ctx);
+	algop->update_fn(ctx, header, header_len);
+}
+
 static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 				struct object_id *result_oid,
 				int fd, size_t size,
@@ -254,8 +267,6 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 {
 	off_t seekback, already_hashed_to;
 	git_hash_ctx ctx;
-	unsigned char obuf[16384];
-	unsigned header_len;
 	struct hashfile_checkpoint checkpoint = {0};
 	struct pack_idx_entry *idx = NULL;
 
@@ -263,10 +274,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 	if (seekback == (off_t) -1)
 		return error("cannot find the current offset");
 
-	header_len = format_object_header((char *)obuf, sizeof(obuf),
-					  OBJ_BLOB, size);
-	the_hash_algo->init_fn(&ctx);
-	the_hash_algo->update_fn(&ctx, obuf, header_len);
+	format_object_header_hash(the_hash_algo, &ctx, OBJ_BLOB, size);
 
 	/* Note: idx is non-NULL when we are writing */
 	if ((flags & HASH_WRITE_OBJECT) != 0)
-- 
2.42.0.8.g7a7e1e881e.dirty


^ permalink raw reply related

* [PATCH 2/7] bulk-checkin: factor out `prepare_checkpoint()`
From: Taylor Blau @ 2023-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano
In-Reply-To: <cover.1696629697.git.me@ttaylorr.com>

In a similar spirit as the previous commit, factor out the routine to
prepare streaming into a bulk-checkin pack into its own function. Unlike
the previous patch, this is a verbatim copy and paste.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bulk-checkin.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 0aac3dfe31..377c41f3ad 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -260,6 +260,19 @@ static void format_object_header_hash(const struct git_hash_algo *algop,
 	algop->update_fn(ctx, header, header_len);
 }
 
+static void prepare_checkpoint(struct bulk_checkin_packfile *state,
+			       struct hashfile_checkpoint *checkpoint,
+			       struct pack_idx_entry *idx,
+			       unsigned flags)
+{
+	prepare_to_stream(state, flags);
+	if (idx) {
+		hashfile_checkpoint(state->f, checkpoint);
+		idx->offset = state->offset;
+		crc32_begin(state->f);
+	}
+}
+
 static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 				struct object_id *result_oid,
 				int fd, size_t size,
@@ -283,12 +296,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 	already_hashed_to = 0;
 
 	while (1) {
-		prepare_to_stream(state, flags);
-		if (idx) {
-			hashfile_checkpoint(state->f, &checkpoint);
-			idx->offset = state->offset;
-			crc32_begin(state->f);
-		}
+		prepare_checkpoint(state, &checkpoint, idx, flags);
 		if (!stream_blob_to_pack(state, &ctx, &already_hashed_to,
 					 fd, size, path, flags))
 			break;
-- 
2.42.0.8.g7a7e1e881e.dirty


^ permalink raw reply related

* [PATCH 3/7] bulk-checkin: factor out `truncate_checkpoint()`
From: Taylor Blau @ 2023-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano
In-Reply-To: <cover.1696629697.git.me@ttaylorr.com>

In a similar spirit as previous commits, factor our the routine to
truncate a bulk-checkin packfile when writing past the pack size limit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bulk-checkin.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 377c41f3ad..2dae8be461 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -273,6 +273,22 @@ static void prepare_checkpoint(struct bulk_checkin_packfile *state,
 	}
 }
 
+static void truncate_checkpoint(struct bulk_checkin_packfile *state,
+				struct hashfile_checkpoint *checkpoint,
+				struct pack_idx_entry *idx)
+{
+	/*
+	 * Writing this object to the current pack will make
+	 * it too big; we need to truncate it, start a new
+	 * pack, and write into it.
+	 */
+	if (!idx)
+		BUG("should not happen");
+	hashfile_truncate(state->f, checkpoint);
+	state->offset = checkpoint->offset;
+	flush_bulk_checkin_packfile(state);
+}
+
 static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 				struct object_id *result_oid,
 				int fd, size_t size,
@@ -300,16 +316,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 		if (!stream_blob_to_pack(state, &ctx, &already_hashed_to,
 					 fd, size, path, flags))
 			break;
-		/*
-		 * Writing this object to the current pack will make
-		 * it too big; we need to truncate it, start a new
-		 * pack, and write into it.
-		 */
-		if (!idx)
-			BUG("should not happen");
-		hashfile_truncate(state->f, &checkpoint);
-		state->offset = checkpoint.offset;
-		flush_bulk_checkin_packfile(state);
+		truncate_checkpoint(state, &checkpoint, idx);
 		if (lseek(fd, seekback, SEEK_SET) == (off_t) -1)
 			return error("cannot seek back");
 	}
-- 
2.42.0.8.g7a7e1e881e.dirty


^ permalink raw reply related

* [PATCH 4/7] bulk-checkin: factor our `finalize_checkpoint()`
From: Taylor Blau @ 2023-10-06 22:01 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano
In-Reply-To: <cover.1696629697.git.me@ttaylorr.com>

In a similar spirit as previous commits, factor out the routine to
finalize the just-written object from the bulk-checkin mechanism.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bulk-checkin.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 2dae8be461..a9497fcb28 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -289,6 +289,30 @@ static void truncate_checkpoint(struct bulk_checkin_packfile *state,
 	flush_bulk_checkin_packfile(state);
 }
 
+static void finalize_checkpoint(struct bulk_checkin_packfile *state,
+				git_hash_ctx *ctx,
+				struct hashfile_checkpoint *checkpoint,
+				struct pack_idx_entry *idx,
+				struct object_id *result_oid)
+{
+	the_hash_algo->final_oid_fn(result_oid, ctx);
+	if (!idx)
+		return;
+
+	idx->crc32 = crc32_end(state->f);
+	if (already_written(state, result_oid)) {
+		hashfile_truncate(state->f, checkpoint);
+		state->offset = checkpoint->offset;
+		free(idx);
+	} else {
+		oidcpy(&idx->oid, result_oid);
+		ALLOC_GROW(state->written,
+			   state->nr_written + 1,
+			   state->alloc_written);
+		state->written[state->nr_written++] = idx;
+	}
+}
+
 static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 				struct object_id *result_oid,
 				int fd, size_t size,
@@ -320,22 +344,7 @@ static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 		if (lseek(fd, seekback, SEEK_SET) == (off_t) -1)
 			return error("cannot seek back");
 	}
-	the_hash_algo->final_oid_fn(result_oid, &ctx);
-	if (!idx)
-		return 0;
-
-	idx->crc32 = crc32_end(state->f);
-	if (already_written(state, result_oid)) {
-		hashfile_truncate(state->f, &checkpoint);
-		state->offset = checkpoint.offset;
-		free(idx);
-	} else {
-		oidcpy(&idx->oid, result_oid);
-		ALLOC_GROW(state->written,
-			   state->nr_written + 1,
-			   state->alloc_written);
-		state->written[state->nr_written++] = idx;
-	}
+	finalize_checkpoint(state, &ctx, &checkpoint, idx, result_oid);
 	return 0;
 }
 
-- 
2.42.0.8.g7a7e1e881e.dirty


^ permalink raw reply related

* [PATCH 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
From: Taylor Blau @ 2023-10-06 22:02 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano
In-Reply-To: <cover.1696629697.git.me@ttaylorr.com>

Now that we have factored out many of the common routines necessary to
index a new object into a pack created by the bulk-checkin machinery, we
can introduce a variant of `index_blob_bulk_checkin()` that acts on
blobs whose contents we can fit in memory.

This will be useful in a couple of more commits in order to provide the
`merge-tree` builtin with a mechanism to create a new pack containing
any objects it created during the merge, instead of storing those
objects individually as loose.

Similar to the existing `index_blob_bulk_checkin()` function, the
entrypoint delegates to `deflate_blob_to_pack_incore()`, which is
responsible for formatting the pack header and then deflating the
contents into the pack. The latter is accomplished by calling
deflate_blob_contents_to_pack_incore(), which takes advantage of the
earlier refactoring and is responsible for writing the object to the
pack and handling any overage from pack.packSizeLimit.

The bulk of the new functionality is implemented in the function
`stream_obj_to_pack_incore()`, which is a generic implementation for
writing objects of arbitrary type (whose contents we can fit in-core)
into a bulk-checkin pack.

The new function shares an unfortunate degree of similarity to the
existing `stream_blob_to_pack()` function. But DRY-ing up these two
would likely be more trouble than it's worth, since the latter has to
deal with reading and writing the contents of the object.

Consistent with the rest of the bulk-checkin mechanism, there are no
direct tests here. In future commits when we expose this new
functionality via the `merge-tree` builtin, we will test it indirectly
there.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bulk-checkin.c | 116 +++++++++++++++++++++++++++++++++++++++++++++++++
 bulk-checkin.h |   4 ++
 2 files changed, 120 insertions(+)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index a9497fcb28..319921efe7 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -140,6 +140,69 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id
 	return 0;
 }
 
+static int stream_obj_to_pack_incore(struct bulk_checkin_packfile *state,
+				     git_hash_ctx *ctx,
+				     off_t *already_hashed_to,
+				     const void *buf, size_t size,
+				     enum object_type type,
+				     const char *path, unsigned flags)
+{
+	git_zstream s;
+	unsigned char obuf[16384];
+	unsigned hdrlen;
+	int status = Z_OK;
+	int write_object = (flags & HASH_WRITE_OBJECT);
+
+	git_deflate_init(&s, pack_compression_level);
+
+	hdrlen = encode_in_pack_object_header(obuf, sizeof(obuf), type, size);
+	s.next_out = obuf + hdrlen;
+	s.avail_out = sizeof(obuf) - hdrlen;
+
+	if (*already_hashed_to < size) {
+		size_t hsize = size - *already_hashed_to;
+		if (hsize) {
+			the_hash_algo->update_fn(ctx, buf, hsize);
+		}
+		*already_hashed_to = size;
+	}
+	s.next_in = (void *)buf;
+	s.avail_in = size;
+
+	while (status != Z_STREAM_END) {
+		status = git_deflate(&s, Z_FINISH);
+		if (!s.avail_out || status == Z_STREAM_END) {
+			if (write_object) {
+				size_t written = s.next_out - obuf;
+
+				/* would we bust the size limit? */
+				if (state->nr_written &&
+				    pack_size_limit_cfg &&
+				    pack_size_limit_cfg < state->offset + written) {
+					git_deflate_abort(&s);
+					return -1;
+				}
+
+				hashwrite(state->f, obuf, written);
+				state->offset += written;
+			}
+			s.next_out = obuf;
+			s.avail_out = sizeof(obuf);
+		}
+
+		switch (status) {
+		case Z_OK:
+		case Z_BUF_ERROR:
+		case Z_STREAM_END:
+			continue;
+		default:
+			die("unexpected deflate failure: %d", status);
+		}
+	}
+	git_deflate_end(&s);
+	return 0;
+}
+
 /*
  * Read the contents from fd for size bytes, streaming it to the
  * packfile in state while updating the hash in ctx. Signal a failure
@@ -313,6 +376,48 @@ static void finalize_checkpoint(struct bulk_checkin_packfile *state,
 	}
 }
 
+static int deflate_obj_contents_to_pack_incore(struct bulk_checkin_packfile *state,
+					       git_hash_ctx *ctx,
+					       struct object_id *result_oid,
+					       const void *buf, size_t size,
+					       enum object_type type,
+					       const char *path, unsigned flags)
+{
+	struct hashfile_checkpoint checkpoint = {0};
+	struct pack_idx_entry *idx = NULL;
+	off_t already_hashed_to = 0;
+
+	/* Note: idx is non-NULL when we are writing */
+	if (flags & HASH_WRITE_OBJECT)
+		CALLOC_ARRAY(idx, 1);
+
+	while (1) {
+		prepare_checkpoint(state, &checkpoint, idx, flags);
+		if (!stream_obj_to_pack_incore(state, ctx, &already_hashed_to,
+					       buf, size, type, path, flags))
+			break;
+		truncate_checkpoint(state, &checkpoint, idx);
+	}
+
+	finalize_checkpoint(state, ctx, &checkpoint, idx, result_oid);
+
+	return 0;
+}
+
+static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state,
+				       struct object_id *result_oid,
+				       const void *buf, size_t size,
+				       const char *path, unsigned flags)
+{
+	git_hash_ctx ctx;
+
+	format_object_header_hash(the_hash_algo, &ctx, OBJ_BLOB, size);
+
+	return deflate_obj_contents_to_pack_incore(state, &ctx, result_oid,
+						   buf, size, OBJ_BLOB, path,
+						   flags);
+}
+
 static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 				struct object_id *result_oid,
 				int fd, size_t size,
@@ -392,6 +497,17 @@ int index_blob_bulk_checkin(struct object_id *oid,
 	return status;
 }
 
+int index_blob_bulk_checkin_incore(struct object_id *oid,
+				   const void *buf, size_t size,
+				   const char *path, unsigned flags)
+{
+	int status = deflate_blob_to_pack_incore(&bulk_checkin_packfile, oid,
+						 buf, size, path, flags);
+	if (!odb_transaction_nesting)
+		flush_bulk_checkin_packfile(&bulk_checkin_packfile);
+	return status;
+}
+
 void begin_odb_transaction(void)
 {
 	odb_transaction_nesting += 1;
diff --git a/bulk-checkin.h b/bulk-checkin.h
index aa7286a7b3..1b91daeaee 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -13,6 +13,10 @@ int index_blob_bulk_checkin(struct object_id *oid,
 			    int fd, size_t size,
 			    const char *path, unsigned flags);
 
+int index_blob_bulk_checkin_incore(struct object_id *oid,
+				   const void *buf, size_t size,
+				   const char *path, unsigned flags);
+
 /*
  * Tell the object database to optimize for adding
  * multiple objects. end_odb_transaction must be called
-- 
2.42.0.8.g7a7e1e881e.dirty


^ permalink raw reply related

* [PATCH 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
From: Taylor Blau @ 2023-10-06 22:02 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano
In-Reply-To: <cover.1696629697.git.me@ttaylorr.com>

The remaining missing piece in order to teach the `merge-tree` builtin
how to write the contents of a merge into a pack is a function to index
tree objects into a bulk-checkin pack.

This patch implements that missing piece, which is a thin wrapper around
all of the functionality introduced in previous commits.

If and when Git gains support for a "compatibility" hash algorithm, the
changes to support that here will be minimal. The bulk-checkin machinery
will need to convert the incoming tree to compute its length under the
compatibility hash, necessary to reconstruct its header. With that
information (and the converted contents of the tree), the bulk-checkin
machinery will have enough to keep track of the converted object's hash
in order to update the compatibility mapping.

Within `deflate_tree_to_pack_incore()`, the changes should be limited
to something like:

    if (the_repository->compat_hash_algo) {
      struct strbuf converted = STRBUF_INIT;
      if (convert_object_file(&compat_obj,
                              the_repository->hash_algo,
                              the_repository->compat_hash_algo, ...) < 0)
        die(...);

      format_object_header_hash(the_repository->compat_hash_algo,
                                OBJ_TREE, size);

      strbuf_release(&converted);
    }

, assuming related changes throughout the rest of the bulk-checkin
machinery necessary to update the hash of the converted object, which
are likewise minimal in size.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bulk-checkin.c | 25 +++++++++++++++++++++++++
 bulk-checkin.h |  4 ++++
 2 files changed, 29 insertions(+)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 319921efe7..d7d46f1dac 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -418,6 +418,20 @@ static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state,
 						   flags);
 }
 
+static int deflate_tree_to_pack_incore(struct bulk_checkin_packfile *state,
+				       struct object_id *result_oid,
+				       const void *buf, size_t size,
+				       const char *path, unsigned flags)
+{
+	git_hash_ctx ctx;
+
+	format_object_header_hash(the_hash_algo, &ctx, OBJ_TREE, size);
+
+	return deflate_obj_contents_to_pack_incore(state, &ctx, result_oid,
+						   buf, size, OBJ_TREE, path,
+						   flags);
+}
+
 static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
 				struct object_id *result_oid,
 				int fd, size_t size,
@@ -508,6 +522,17 @@ int index_blob_bulk_checkin_incore(struct object_id *oid,
 	return status;
 }
 
+int index_tree_bulk_checkin_incore(struct object_id *oid,
+				   const void *buf, size_t size,
+				   const char *path, unsigned flags)
+{
+	int status = deflate_tree_to_pack_incore(&bulk_checkin_packfile, oid,
+						 buf, size, path, flags);
+	if (!odb_transaction_nesting)
+		flush_bulk_checkin_packfile(&bulk_checkin_packfile);
+	return status;
+}
+
 void begin_odb_transaction(void)
 {
 	odb_transaction_nesting += 1;
diff --git a/bulk-checkin.h b/bulk-checkin.h
index 1b91daeaee..89786b3954 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -17,6 +17,10 @@ int index_blob_bulk_checkin_incore(struct object_id *oid,
 				   const void *buf, size_t size,
 				   const char *path, unsigned flags);
 
+int index_tree_bulk_checkin_incore(struct object_id *oid,
+				   const void *buf, size_t size,
+				   const char *path, unsigned flags);
+
 /*
  * Tell the object database to optimize for adding
  * multiple objects. end_odb_transaction must be called
-- 
2.42.0.8.g7a7e1e881e.dirty


^ permalink raw reply related

* [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack`
From: Taylor Blau @ 2023-10-06 22:02 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano
In-Reply-To: <cover.1696629697.git.me@ttaylorr.com>

When using merge-tree often within a repository[^1], it is possible to
generate a relatively large number of loose objects, which can result in
degraded performance, and inode exhaustion in extreme cases.

Building on the functionality introduced in previous commits, the
bulk-checkin machinery now has support to write arbitrary blob and tree
objects which are small enough to be held in-core. We can use this to
write any blob/tree objects generated by ORT into a separate pack
instead of writing them out individually as loose.

This functionality is gated behind a new `--write-pack` option to
`merge-tree` that works with the (non-deprecated) `--write-tree` mode.

The implementation is relatively straightforward. There are two spots
within the ORT mechanism where we call `write_object_file()`, one for
content differences within blobs, and another to assemble any new trees
necessary to construct the merge. In each of those locations,
conditionally replace calls to `write_object_file()` with
`index_blob_bulk_checkin_incore()` or `index_tree_bulk_checkin_incore()`
depending on which kind of object we are writing.

The only remaining task is to begin and end the transaction necessary to
initialize the bulk-checkin machinery, and move any new pack(s) it
created into the main object store.

[^1]: Such is the case at GitHub, where we run presumptive "test merges"
  on open pull requests to see whether or not we can light up the merge
  button green depending on whether or not the presumptive merge was
  conflicted.

  This is done in response to a number of user-initiated events,
  including viewing an open pull request whose last test merge is stale
  with respect to the current base and tip of the pull request. As a
  result, merge-tree can be run very frequently on large, active
  repositories.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-merge-tree.txt |  4 ++
 builtin/merge-tree.c             |  5 ++
 merge-ort.c                      | 43 +++++++++++----
 merge-recursive.h                |  1 +
 t/t4301-merge-tree-write-tree.sh | 93 ++++++++++++++++++++++++++++++++
 5 files changed, 136 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-merge-tree.txt b/Documentation/git-merge-tree.txt
index ffc4fbf7e8..9d37609ef1 100644
--- a/Documentation/git-merge-tree.txt
+++ b/Documentation/git-merge-tree.txt
@@ -69,6 +69,10 @@ OPTIONS
 	specify a merge-base for the merge, and specifying multiple bases is
 	currently not supported. This option is incompatible with `--stdin`.
 
+--write-pack::
+	Write any new objects into a separate packfile instead of as
+	individual loose objects.
+
 [[OUTPUT]]
 OUTPUT
 ------
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index 0de42aecf4..672ebd4c54 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -18,6 +18,7 @@
 #include "quote.h"
 #include "tree.h"
 #include "config.h"
+#include "bulk-checkin.h"
 
 static int line_termination = '\n';
 
@@ -414,6 +415,7 @@ struct merge_tree_options {
 	int show_messages;
 	int name_only;
 	int use_stdin;
+	int write_pack;
 };
 
 static int real_merge(struct merge_tree_options *o,
@@ -440,6 +442,7 @@ static int real_merge(struct merge_tree_options *o,
 	init_merge_options(&opt, the_repository);
 
 	opt.show_rename_progress = 0;
+	opt.write_pack = o->write_pack;
 
 	opt.branch1 = branch1;
 	opt.branch2 = branch2;
@@ -548,6 +551,8 @@ int cmd_merge_tree(int argc, const char **argv, const char *prefix)
 			   &merge_base,
 			   N_("commit"),
 			   N_("specify a merge-base for the merge")),
+		OPT_BOOL(0, "write-pack", &o.write_pack,
+			 N_("write new objects to a pack instead of as loose")),
 		OPT_END()
 	};
 
diff --git a/merge-ort.c b/merge-ort.c
index 8631c99700..85d8c5c6b3 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -48,6 +48,7 @@
 #include "tree.h"
 #include "unpack-trees.h"
 #include "xdiff-interface.h"
+#include "bulk-checkin.h"
 
 /*
  * We have many arrays of size 3.  Whenever we have such an array, the
@@ -2124,11 +2125,19 @@ static int handle_content_merge(struct merge_options *opt,
 		if ((merge_status < 0) || !result_buf.ptr)
 			ret = err(opt, _("Failed to execute internal merge"));
 
-		if (!ret &&
-		    write_object_file(result_buf.ptr, result_buf.size,
-				      OBJ_BLOB, &result->oid))
-			ret = err(opt, _("Unable to add %s to database"),
-				  path);
+		if (!ret) {
+			ret = opt->write_pack
+				? index_blob_bulk_checkin_incore(&result->oid,
+								 result_buf.ptr,
+								 result_buf.size,
+								 path, 1)
+				: write_object_file(result_buf.ptr,
+						    result_buf.size,
+						    OBJ_BLOB, &result->oid);
+			if (ret)
+				ret = err(opt, _("Unable to add %s to database"),
+					  path);
+		}
 
 		free(result_buf.ptr);
 		if (ret)
@@ -3618,7 +3627,8 @@ static int tree_entry_order(const void *a_, const void *b_)
 				 b->string, strlen(b->string), bmi->result.mode);
 }
 
-static int write_tree(struct object_id *result_oid,
+static int write_tree(struct merge_options *opt,
+		      struct object_id *result_oid,
 		      struct string_list *versions,
 		      unsigned int offset,
 		      size_t hash_size)
@@ -3652,8 +3662,14 @@ static int write_tree(struct object_id *result_oid,
 	}
 
 	/* Write this object file out, and record in result_oid */
-	if (write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid))
+	ret = opt->write_pack
+		? index_tree_bulk_checkin_incore(result_oid,
+						 buf.buf, buf.len, "", 1)
+		: write_object_file(buf.buf, buf.len, OBJ_TREE, result_oid);
+
+	if (ret)
 		ret = -1;
+
 	strbuf_release(&buf);
 	return ret;
 }
@@ -3818,8 +3834,8 @@ static int write_completed_directory(struct merge_options *opt,
 		 */
 		dir_info->is_null = 0;
 		dir_info->result.mode = S_IFDIR;
-		if (write_tree(&dir_info->result.oid, &info->versions, offset,
-			       opt->repo->hash_algo->rawsz) < 0)
+		if (write_tree(opt, &dir_info->result.oid, &info->versions,
+			       offset, opt->repo->hash_algo->rawsz) < 0)
 			ret = -1;
 	}
 
@@ -4353,9 +4369,13 @@ static int process_entries(struct merge_options *opt,
 		fflush(stdout);
 		BUG("dir_metadata accounting completely off; shouldn't happen");
 	}
-	if (write_tree(result_oid, &dir_metadata.versions, 0,
+	if (write_tree(opt, result_oid, &dir_metadata.versions, 0,
 		       opt->repo->hash_algo->rawsz) < 0)
 		ret = -1;
+
+	if (opt->write_pack)
+		end_odb_transaction();
+
 cleanup:
 	string_list_clear(&plist, 0);
 	string_list_clear(&dir_metadata.versions, 0);
@@ -4899,6 +4919,9 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 	 */
 	strmap_init(&opt->priv->conflicts);
 
+	if (opt->write_pack)
+		begin_odb_transaction();
+
 	trace2_region_leave("merge", "allocate/init", opt->repo);
 }
 
diff --git a/merge-recursive.h b/merge-recursive.h
index b88000e3c2..156e160876 100644
--- a/merge-recursive.h
+++ b/merge-recursive.h
@@ -48,6 +48,7 @@ struct merge_options {
 	unsigned renormalize : 1;
 	unsigned record_conflict_msgs_as_headers : 1;
 	const char *msg_header_prefix;
+	unsigned write_pack : 1;
 
 	/* internal fields used by the implementation */
 	struct merge_options_internal *priv;
diff --git a/t/t4301-merge-tree-write-tree.sh b/t/t4301-merge-tree-write-tree.sh
index 250f721795..2d81ff4de5 100755
--- a/t/t4301-merge-tree-write-tree.sh
+++ b/t/t4301-merge-tree-write-tree.sh
@@ -922,4 +922,97 @@ test_expect_success 'check the input format when --stdin is passed' '
 	test_cmp expect actual
 '
 
+packdir=".git/objects/pack"
+
+test_expect_success 'merge-tree can pack its result with --write-pack' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+
+	# base has lines [3, 4, 5]
+	#   - side adds to the beginning, resulting in [1, 2, 3, 4, 5]
+	#   - other adds to the end, resulting in [3, 4, 5, 6, 7]
+	#
+	# merging the two should result in a new blob object containing
+	# [1, 2, 3, 4, 5, 6, 7], along with a new tree.
+	test_commit -C repo base file "$(test_seq 3 5)" &&
+	git -C repo branch -M main &&
+	git -C repo checkout -b side main &&
+	test_commit -C repo side file "$(test_seq 1 5)" &&
+	git -C repo checkout -b other main &&
+	test_commit -C repo other file "$(test_seq 3 7)" &&
+
+	find repo/$packdir -type f -name "pack-*.idx" >packs.before &&
+	tree="$(git -C repo merge-tree --write-pack \
+		refs/tags/side refs/tags/other)" &&
+	blob="$(git -C repo rev-parse $tree:file)" &&
+	find repo/$packdir -type f -name "pack-*.idx" >packs.after &&
+
+	test_must_be_empty packs.before &&
+	test_line_count = 1 packs.after &&
+
+	git show-index <$(cat packs.after) >objects &&
+	test_line_count = 2 objects &&
+	grep "^[1-9][0-9]* $tree" objects &&
+	grep "^[1-9][0-9]* $blob" objects
+'
+
+test_expect_success 'merge-tree can write multiple packs with --write-pack' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		git config pack.packSizeLimit 512 &&
+
+		test_seq 512 >f &&
+
+		# "f" contains roughly ~2,000 bytes.
+		#
+		# Each side ("foo" and "bar") adds a small amount of data at the
+		# beginning and end of "base", respectively.
+		git add f &&
+		test_tick &&
+		git commit -m base &&
+		git branch -M main &&
+
+		git checkout -b foo main &&
+		{
+			echo foo && cat f
+		} >f.tmp &&
+		mv f.tmp f &&
+		git add f &&
+		test_tick &&
+		git commit -m foo &&
+
+		git checkout -b bar main &&
+		echo bar >>f &&
+		git add f &&
+		test_tick &&
+		git commit -m bar &&
+
+		find $packdir -type f -name "pack-*.idx" >packs.before &&
+		# Merging either side should result in a new object which is
+		# larger than 1M, thus the result should be split into two
+		# separate packs.
+		tree="$(git merge-tree --write-pack \
+			refs/heads/foo refs/heads/bar)" &&
+		blob="$(git rev-parse $tree:f)" &&
+		find $packdir -type f -name "pack-*.idx" >packs.after &&
+
+		test_must_be_empty packs.before &&
+		test_line_count = 2 packs.after &&
+		for idx in $(cat packs.after)
+		do
+			git show-index <$idx || return 1
+		done >objects &&
+
+		# The resulting set of packs should contain one copy of both
+		# objects, each in a separate pack.
+		test_line_count = 2 objects &&
+		grep "^[1-9][0-9]* $tree" objects &&
+		grep "^[1-9][0-9]* $blob" objects
+
+	)
+'
+
 test_done
-- 
2.42.0.8.g7a7e1e881e.dirty

^ permalink raw reply related

* Re: [PATCH 4/4] files-backend.c: avoid stat in 'loose_fill_ref_dir'
From: Junio C Hamano @ 2023-10-06 22:12 UTC (permalink / raw)
  To: Victoria Dye via GitGitGadget; +Cc: git, Victoria Dye
In-Reply-To: <e193a45318244d9f8b05dfe2fb1ce57f6a4f6428.1696615769.git.gitgitgadget@gmail.com>

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Unlike other existing usage of 'get_dtype', the 'follow_symlinks' arg is set
> to 1 to replicate the existing handling of symlink dirents. This
> unfortunately requires calling 'stat' on the associated entry regardless of
> platform, but symlinks in the loose ref store are highly unlikely since
> they'd need to be created manually by a user.

Yeek.  I wonder what breaks if we do not do this follow_symlinks()
part, i.e., either just replace stat() with lstat() in the original
without any of these four patches (which would be simple to figure
out what breaks), or omit [3/4] and let get_dtype() yield DT_LNK.

It seems that it comes from a7e66ae3 ([PATCH] Make do_each_ref()
follow symlinks., 2005-08-16), and just like I commented on there in
its log message back then, I still doubt that following a symbolic
link is a great idea here in this codepath.

But optimization without behaviour change is a good way to ensure
that optimization does not introduce new bugs, and because keeping
the historical behaviour like the patches [3/4] and this patch does
is more work (meaning: if it proves unnecessary to dereference
symbolic links, we can remove code instead of having to write new
code to support the new behaviour), let's take the series as-is, and
defer it to future developers to further clean-up the semantics.

> Note that this patch also changes the condition for skipping creation of a
> ref entry from "when 'stat' fails" to "when the d_type is anything other
> than DT_REG or DT_DIR". If a dirent's d_type is DT_UNKNOWN (either because
> the platform doesn't support d_type in dirents or some other reason) or
> DT_LNK, 'get_dtype' will try to derive the underlying type with 'stat'. If
> the 'stat' fails, the d_type will remain 'DT_UNKNOWN' and dirent will be
> skipped. However, it will also be skipped if it is any other valid d_type
> (e.g. DT_FIFO for named pipes, DT_LNK for a nested symlink). Git does not
> handle these properly anyway, so we can safely constrain accepted types to
> directories and regular files.

Sounds good.

> Signed-off-by: Victoria Dye <vdye@github.com>
> ---
>  refs/files-backend.c | 14 +++++---------
>  1 file changed, 5 insertions(+), 9 deletions(-)

Thanks.


^ permalink raw reply

* Re: [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack`
From: Junio C Hamano @ 2023-10-06 22:35 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Elijah Newren, Eric W. Biederman, Jeff King
In-Reply-To: <e96921014557edb41dd73d93a8c3cf6cfaf0c719.1696629697.git.me@ttaylorr.com>

Taylor Blau <me@ttaylorr.com> writes:

> When using merge-tree often within a repository[^1], it is possible to
> generate a relatively large number of loose objects, which can result in
> degraded performance, and inode exhaustion in extreme cases.

Well, be it "git merge-tree" or "git merge", new loose objects tend
to accumulate until "gc" kicks in, so it is not a new problem for
mere mortals, is it?

As one "interesting" use case of "merge-tree" is for a Git hosting
site with bare repositories to offer trial merges, without which
majority of the object their repositories acquire would have been in
packs pushed by their users, "Gee, loose objects consume many inodes
in exchange for easier selective pruning" becomes an issue, right?

Just like it hurts performance to have too many loose object files,
presumably it would also hurt performance to keep too many packs,
each came from such a trial merge.  Do we have a "gc" story offered
for these packs created by the new feature?  E.g., "once merge-tree
is done creating a trial merge, we can discard the objects created
in the pack, because we never expose new objects in the pack to the
outside, processes running simultaneously, so instead closing the
new packfile by calling flush_bulk_checkin_packfile(), we can safely
unlink the temporary pack.  We do not even need to spend cycles to
run a gc that requires cycles to enumerate what is still reachable",
or something like that?

Thanks.

^ permalink raw reply

* [OUTREACHY] Permission To Work On Tasks
From: Naomi Ibe @ 2023-10-06 22:41 UTC (permalink / raw)
  To: git

So I went through this link
https://github.com/gitgitgadget/git/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22
 and I found two issues that interest me.

First issue is here https://github.com/gitgitgadget/git/issues/635 ,
involving changing the "die()" error msg outputs to all lowercase. I
found a few files here https://github.com/git/git/tree/master/builtin
where the "die()" error msg had some uppercase in them (add.c in lines
185, 203, 205, 211 and 571) (branch.c in lines 521, 525, 581, 597,
599, 627, 629, 643, 650, 652, 776, 926, 954 and 968). If I'm allowed
to work on this issue, how many files should I edit? The last closed
issues related to this issue had edited five files.

Second issue is this https://github.com/gitgitgadget/git/issues/302 .
Is it still available to be worked on? I notice it was opened in 2019

^ permalink raw reply

* [PATCH] doc: update list archive reference to use lore.kernel.org
From: Junio C Hamano @ 2023-10-06 22:57 UTC (permalink / raw)
  To: git

No disrespect to other mailing list archives, but the local part of
their URLs will become pretty much meaningless once the archives go
out of service, and we learned the lesson hard way when $gmane
stopped serving.

Let's point into https://lore.kernel.org/ for an article that can be
found there, because the local part of the URL has the Message-Id:
that can be used to find the same message in other archives, even if
lore goes down.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 Documentation/CodingGuidelines | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 65af8d82ce..71afc5b259 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -24,7 +24,7 @@ code.  For Git in general, a few rough rules are:
 
    "Once it _is_ in the tree, it's not really worth the patch noise to
    go and fix it up."
-   Cf. http://lkml.iu.edu/hypermail/linux/kernel/1001.3/01069.html
+   Cf. https://lore.kernel.org/all/20100126160632.3bdbe172.akpm@linux-foundation.org/
 
  - Log messages to explain your changes are as important as the
    changes themselves.  Clearly written code and in-code comments
-- 
2.42.0-325-g3a06386e31


^ permalink raw reply related

* Re: [PATCH v7 2/3] unit tests: add TAP unit test framework
From: Josh Steadmon @ 2023-10-06 22:58 UTC (permalink / raw)
  To: phillip.wood; +Cc: Junio C Hamano, git, linusa, calvinwan, rsbecker
In-Reply-To: <0b6de919-8dbf-454f-807b-5abb64388cb7@gmail.com>

On 2023.09.24 14:57, phillip.wood123@gmail.com wrote:
> On 22/09/2023 21:05, Junio C Hamano wrote:
> > Any thought on the "polarity" of the return values from the
> > assertion?  I still find it confusing and hard to follow.
> 
> When I was writing this I was torn between whether to follow our usual
> convention of returning zero for success and minus one for failure or to
> return one for success and zero for failure. In the end I decided to go with
> the former but I tend to agree with you that the latter would be easier to
> understand.

Agreed. V8 will switch to 0 for failure and 1 for success for the TEST,
TEST_TODO, and check macros.


> > > > +test_expect_success 'TAP output from unit tests' '
> > > > [...]
> > > > +	ok 19 - test with no checks returns -1
> > > > +	1..19
> > > > +	EOF
> > > 
> > > Presumably t-basic will serve as a catalog of check_* functions and
> > > the test binary, together with this test piece, will keep growing as
> > > we gain features in the unit tests infrastructure.  I wonder how
> > > maintainable the above is, though.  When we acquire new test, we
> > > would need to renumber.  What if multiple developers add new
> > > features to the catalog at the same time?
> 
> I think we could just add new tests to the end so we'd only need to change
> the "1..19" line. That will become a source of merge conflicts if multiple
> developers add new features at the same time though. Having several unit
> test programs called from separate tests in t0080 might help with that.

My hope is that test-lib.c will not have to grow too extensively after
this series; that said, it's already been a pain to have to adjust the
t0080 expected text several times just during development of this
series. I'll look into splitting this into several "meta-tests", but I'm
not sure I'll get to it for V8 yet.


> > > > diff --git a/t/unit-tests/.gitignore b/t/unit-tests/.gitignore
> > > > new file mode 100644
> > > > index 0000000000..e292d58348
> > > > --- /dev/null
> > > > +++ b/t/unit-tests/.gitignore
> > > > @@ -0,0 +1,2 @@
> > > > +/t-basic
> > > > +/t-strbuf
> > > 
> > > Also, can we come up with some naming convention so that we do not
> > > have to keep adding to this file every time we add a new test
> > > script?
> 
> Perhaps we should put the unit test binaries in a separate directory so we
> can just add that directory to .gitignore.

Sounds good to me.

^ permalink raw reply

* Re: [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack`
From: Taylor Blau @ 2023-10-06 23:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Elijah Newren, Eric W. Biederman, Jeff King
In-Reply-To: <xmqqil7j751u.fsf@gitster.g>

On Fri, Oct 06, 2023 at 03:35:25PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > When using merge-tree often within a repository[^1], it is possible to
> > generate a relatively large number of loose objects, which can result in
> > degraded performance, and inode exhaustion in extreme cases.
>
> Well, be it "git merge-tree" or "git merge", new loose objects tend
> to accumulate until "gc" kicks in, so it is not a new problem for
> mere mortals, is it?

Yeah, I would definitely suspect that this is more of an issue for
forges than individual Git users.

> As one "interesting" use case of "merge-tree" is for a Git hosting
> site with bare repositories to offer trial merges, without which
> majority of the object their repositories acquire would have been in
> packs pushed by their users, "Gee, loose objects consume many inodes
> in exchange for easier selective pruning" becomes an issue, right?

Right.

> Just like it hurts performance to have too many loose object files,
> presumably it would also hurt performance to keep too many packs,
> each came from such a trial merge.  Do we have a "gc" story offered
> for these packs created by the new feature?  E.g., "once merge-tree
> is done creating a trial merge, we can discard the objects created
> in the pack, because we never expose new objects in the pack to the
> outside, processes running simultaneously, so instead closing the
> new packfile by calling flush_bulk_checkin_packfile(), we can safely
> unlink the temporary pack.  We do not even need to spend cycles to
> run a gc that requires cycles to enumerate what is still reachable",
> or something like that?

I know Johannes worked on something like this recently. IIRC, it
effectively does something like:

    struct tmp_objdir *tmp_objdir = tmp_objdir_create(...);
    tmp_objdir_replace_primary_odb(tmp_objdir, 1);

at the beginning of a merge operation, and:

    tmp_objdir_discard_objects(tmp_objdir);

at the end. I haven't followed that work off-list very closely, but it
is only possible for GitHub to discard certain niche kinds of
merges/rebases, since in general we make the objects created during test
merges available via refs/pull/N/{merge,rebase}.

I think that like anything, this is a trade-off. Having lots of packs
can be a performance hindrance just like having lots of loose objects.
But since we can represent more objects with fewer inodes when packed,
storing those objects together in a pack is preferable when (a) you're
doing lots of test-merges, and (b) you want to keep those objects
around, e.g., because they are reachable.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH v3 1/3] diff-merges: improve --diff-merges documentation
From: Junio C Hamano @ 2023-10-06 23:19 UTC (permalink / raw)
  To: Sergey Organov; +Cc: Elijah Newren, git
In-Reply-To: <875y3jr42h.fsf@osv.gnss.ru>

Sergey Organov <sorganov@gmail.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Elijah Newren <newren@gmail.com> writes:
>>
>>>> > +--cc::
>>>> > +     Produce dense combined diff output for merge commits.
>>>> > +     Shortcut for '--diff-merges=dense-combined -p'.
>>>>
>>>> Good.
>>>>
>>>> > +--remerge-diff::
>>>> > +     Produce diff against re-merge.
>>>> > +     Shortcut for '--diff-merges=remerge -p'.
>>>> ...
>>> Perhaps:
>>>
>>> Produce remerge-diff output for merge commits, in order to show how
>>> conflicts were resolved.
>>
>> I do not mind it, but then I'd prefer to see ", in order to show
>> how" also in the description of "--cc" and "-c" for consistency.
>>
>> A succinct way to say what they do may be hard to come by, but I
>> think of them showing places that did not have obvious natural
>> resolution.
>
> So, is it OK with both of you if I leave it as:
>
> "Produce remerge-diff output for merge commits."
>
> for now, and let you tweak the descriptions later on, if needed?

I do not know what Elijah would say, but in one of iterations of my
draft response to him indeed suggested that "in order to" here is
not necessary if it is described for the "--diff-merges=remerge"
option, because those who know enough to skip referring to the other
entry are expected to know why it exists.  So I think I am OK with
that.

Thanks.

^ permalink raw reply

* Re: [OUTREACHY] Permission To Work On Tasks
From: Junio C Hamano @ 2023-10-06 23:28 UTC (permalink / raw)
  To: Naomi Ibe; +Cc: git
In-Reply-To: <CACS=G2zsJxP+NWuosZyrFGctJptHNYTrULErRo_Ns41KeMuMqA@mail.gmail.com>

Naomi Ibe <naomi.ibeh69@gmail.com> writes:

> First issue is here https://github.com/gitgitgadget/git/issues/635 ,
> involving changing the "die()" error msg outputs to all lowercase. I
> found a few files here https://github.com/git/git/tree/master/builtin
> where the "die()" error msg had some uppercase in them (add.c in lines
> 185, 203, 205, 211 and 571) (branch.c in lines 521, 525, 581, 597,
> 599, 627, 629, 643, 650, 652, 776, 926, 954 and 968). If I'm allowed
> to work on this issue, how many files should I edit? The last closed
> issues related to this issue had edited five files.

As the "general microproject information" page says, it is a good
idea to do just one quality-focused microproject per applicant.

If I were on the receiving end to review such a patch, I would
probably find it is too boring a burden if it had several unrelated
commands covered by a single patch, and would stop reading in the
middle.

If I were on the sending end to work on them for real (not as "dip
my toe in the water and say hello to the more experienced
developers" exercise), I would probably prepare a series of patches,
one for each git subcommand (e.g. "add", "branch", "log", etc.), and
for shared infrastructure files, one for each subsystem that they
are part of (this is harder to do for a new person who do not know
what subsystems exist, and which files implement which subsystem),
but for a microproject, I would say a single file under builtin/*
hierarchy would be a good size.

> Second issue is this https://github.com/gitgitgadget/git/issues/302 .
> Is it still available to be worked on? I notice it was opened in 2019

Stepping back a bit, do you agree with what the issue says?
Remember, these "issue"s are merely one person's opinion and not
endorsed by the community.

Before you ask "is it still available", do you know the current
status (not the status of the "issue")?  Have you looked at "git
commit --help" to find it out yourself to see if "now" is singled
out?  Here is what we say in our documentation:

    In addition to recognizing all date formats above, the --date
    option will also try to make sense of other, more human-centric
    date formats, such as relative dates like "yesterday" or "last
    Friday at noon".

So apparently it is still "available".  It is a different matter how
well a patch that adds "now" to the examples listed there will be
accepted, though.  During a microproject, one of the things new
contributors are expected to learn is to convince others the cause
of their patches with the proposed commit log message well.

Finally, you do not need to obtain permission to work on anything
around here.  You work on what interests you, send the result (or
send request for help, to which others may offer advices if the
problem you are solving looks interesting) to be reviewed, and will
be thanked for working on it when your patch is applied.  To avoid
duplicated work, you might want to say "I'm interested in doing
this, but is anybody already doing it?  If so I'll avoid stepping on
their toes", but otherwise, you are expected to go wild on your own
;-)

Have fun.

^ permalink raw reply

* [PATCH 1/1] add: Enable attr pathspec magic for git-add.
From: Joanna Wang @ 2023-10-07  0:28 UTC (permalink / raw)
  To: gitster; +Cc: git, jojwang
In-Reply-To: <xmqqttr3adgg.fsf@gitster.g>

This lets users limit added files or exclude files based on file
attributes. For example, the chromium project would like to use
this like "git add --all ':(exclude,attr:submodule)'", as submodules
are managed in a unique way and often results in submodule changes
that users do not want in their commits.

This does not change any attr magic implementation. It is only adding
attr as an allowed pathspec in git-add, which was previously
blocked by GUARD_PATHSPEC and a pathspec mask in parse_pathspec()).

With this patch, attr is supported. It is possible that when the attr
pathspec feature was first added in
b0db704652 (pathspec: allow querying for attributes, 2017-03-13), 
"PATHSPEC_ATTR" was just unintentionally left out of a few
GUARD_PATHSPEC() invocations. Later, to get a more user-friendly error
message when attr was used with git-add, PATHSPEC_ATTR was added as a
mask to git-add's invocation of parse_pathspec()
84d938b732 (add: do not accept pathspec magic 'attr', 2018-09-18).

git-stash which goes through the same GUARD_PATHSPEC(), currently
does not work with attr. So a PATHSPEC_ATTR mask has been added
to its parse_pathspec and parse_pathspec_file()

Signed-off-by: Joanna Wang <jojwang@google.com>
---

> (perhaps except for the commit title on the Subject: header).
Added 'add:' and specified it is 'patchpsec' magic, but lmk if i'm still missing something.

> This is rather a poor statement to make, as it hints that there are
> known breakages this change will reveal that you are not telling us,
>although I suspect that it is not the case.
Yes, sorry I did not mean to make this hint. I was trying to express that
if bugs did come up, we would not abandon this effort and expect someone
else to fix it. I have removed this line. I realized I also don't want
to say we will forever be on the hook for fixing every related bug that comes up.

> These are good things to add to the new test this patch adds.
Tests added.

>Hmph, is it a good idea in general to use ALL_MAGIC in guard?
My original reasoning was using ALL_MAGIC would prevent new magic pathspecs
from being unintentionally left out of commands and if a new magic does not
work with a command, whoever adds it should change ALL_MAGIC back to listing
them individually in the same patch. But yes, listing them individually seems
safer. Better to accidentally leave magic out that can be added with intention
later rather than make it easy to accidentally leave unsupported magic in.
I switched it back to listing individually.

>It is strongly preferrable, instead of butchering this test that
>guards these two mechanisms from being broken, to find a command
>that still has some restriction on the magic it allows, and use it
>to make sure they still trigger and give "magic not supported" error
>message.
I added the test back with git-stash, which, after more testing i discovered it
actually did not fully work with attr. More info below.

>Unless it matters that these files have recent timestamps, do not
>use "touch", merely to ensure presence of a file.  We often use a
>simple redirection.
Done

>A tangent (to the topic of testing, but relevant to the whole
>patch).  I notice 'stash' is mentioned on the topic, but I do not
>see changes to the codepath that is specific to 'stash', and changes
>to the tests do not demonstrate existing breakage.  The lack of code
>changes probably is because something shared, which is pretected
>with magic guard lifted by the patch, is called from 'stash' as well
>as 'add', or something?
Yes, the previous patch enabled attr with git-stash which was only blocked
at the shared dir.c GUARD_PATHSPEC level. I thought attr was working for git-stash
but I was wrong after a few more manual tests.
So I added PATHSPEC_ATTR as a mask to stash's parse_pathspec and parse_pathspec_file.

 builtin/add.c                  |   7 ++-
 builtin/stash.c                |   4 +-
 dir.c                          |  13 ++--
 t/t6135-pathspec-with-attrs.sh | 109 +++++++++++++++++++++++++++++++--
 4 files changed, 117 insertions(+), 16 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index c27254a5cd..2de83964a3 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -424,7 +424,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	 * Check the "pathspec '%s' did not match any files" block
 	 * below before enabling new magic.
 	 */
-	parse_pathspec(&pathspec, PATHSPEC_ATTR,
+	parse_pathspec(&pathspec, 0,
 		       PATHSPEC_PREFER_FULL |
 		       PATHSPEC_SYMLINK_LEADING_PATH,
 		       prefix, argv);
@@ -433,7 +433,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 		if (pathspec.nr)
 			die(_("'%s' and pathspec arguments cannot be used together"), "--pathspec-from-file");
 
-		parse_pathspec_file(&pathspec, PATHSPEC_ATTR,
+		parse_pathspec_file(&pathspec, 0,
 				    PATHSPEC_PREFER_FULL |
 				    PATHSPEC_SYMLINK_LEADING_PATH,
 				    prefix, pathspec_from_file, pathspec_file_nul);
@@ -504,7 +504,8 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 			       PATHSPEC_LITERAL |
 			       PATHSPEC_GLOB |
 			       PATHSPEC_ICASE |
-			       PATHSPEC_EXCLUDE);
+			       PATHSPEC_EXCLUDE |
+			       PATHSPEC_ATTR);
 
 		for (i = 0; i < pathspec.nr; i++) {
 			const char *path = pathspec.items[i].match;
diff --git a/builtin/stash.c b/builtin/stash.c
index 1ad496985a..9c77d3e4e4 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1760,7 +1760,7 @@ static int push_stash(int argc, const char **argv, const char *prefix,
 		}
 	}
 
-	parse_pathspec(&ps, 0, PATHSPEC_PREFER_FULL | PATHSPEC_PREFIX_ORIGIN,
+	parse_pathspec(&ps, PATHSPEC_ATTR, PATHSPEC_PREFER_FULL | PATHSPEC_PREFIX_ORIGIN,
 		       prefix, argv);
 
 	if (pathspec_from_file) {
@@ -1773,7 +1773,7 @@ static int push_stash(int argc, const char **argv, const char *prefix,
 		if (ps.nr)
 			die(_("'%s' and pathspec arguments cannot be used together"), "--pathspec-from-file");
 
-		parse_pathspec_file(&ps, 0,
+		parse_pathspec_file(&ps, PATHSPEC_ATTR,
 				    PATHSPEC_PREFER_FULL | PATHSPEC_PREFIX_ORIGIN,
 				    prefix, pathspec_from_file, pathspec_file_nul);
 	} else if (pathspec_file_nul) {
diff --git a/dir.c b/dir.c
index 8486e4d56f..9bf9b53ca5 100644
--- a/dir.c
+++ b/dir.c
@@ -2174,12 +2174,13 @@ static int exclude_matches_pathspec(const char *path, int pathlen,
 		return 0;
 
 	GUARD_PATHSPEC(pathspec,
-		       PATHSPEC_FROMTOP |
-		       PATHSPEC_MAXDEPTH |
-		       PATHSPEC_LITERAL |
-		       PATHSPEC_GLOB |
-		       PATHSPEC_ICASE |
-		       PATHSPEC_EXCLUDE);
+                       PATHSPEC_FROMTOP |
+                       PATHSPEC_MAXDEPTH |
+                       PATHSPEC_LITERAL |
+                       PATHSPEC_GLOB |
+                       PATHSPEC_ICASE |
+                       PATHSPEC_EXCLUDE |
+                       PATHSPEC_ATTR);
 
 	for (i = 0; i < pathspec->nr; i++) {
 		const struct pathspec_item *item = &pathspec->items[i];
diff --git a/t/t6135-pathspec-with-attrs.sh b/t/t6135-pathspec-with-attrs.sh
index f70c395e75..531b4f4d5e 100755
--- a/t/t6135-pathspec-with-attrs.sh
+++ b/t/t6135-pathspec-with-attrs.sh
@@ -64,12 +64,24 @@ test_expect_success 'setup .gitattributes' '
 	fileSetLabel label
 	fileValue label=foo
 	fileWrongLabel label☺
+	newFileA* labelA
+	newFileB* labelB
 	EOF
 	echo fileSetLabel label1 >sub/.gitattributes &&
 	git add .gitattributes sub/.gitattributes &&
 	git commit -m "add attributes"
 '
 
+test_expect_success 'setup .gitignore' '
+	cat <<-\EOF >.gitignore &&
+	actual
+	expect
+	pathspec_file
+	EOF
+	git add .gitignore &&
+	git commit -m "add gitignore"
+'
+
 test_expect_success 'check specific set attr' '
 	cat <<-\EOF >expect &&
 	fileSetLabel
@@ -150,6 +162,7 @@ test_expect_success 'check specific value attr (2)' '
 test_expect_success 'check unspecified attr' '
 	cat <<-\EOF >expect &&
 	.gitattributes
+	.gitignore
 	fileA
 	fileAB
 	fileAC
@@ -175,6 +188,7 @@ test_expect_success 'check unspecified attr' '
 test_expect_success 'check unspecified attr (2)' '
 	cat <<-\EOF >expect &&
 	HEAD:.gitattributes
+	HEAD:.gitignore
 	HEAD:fileA
 	HEAD:fileAB
 	HEAD:fileAC
@@ -200,6 +214,7 @@ test_expect_success 'check unspecified attr (2)' '
 test_expect_success 'check multiple unspecified attr' '
 	cat <<-\EOF >expect &&
 	.gitattributes
+	.gitignore
 	fileC
 	fileNoLabel
 	fileWrongLabel
@@ -239,16 +254,100 @@ test_expect_success 'fail on multiple attr specifiers in one pathspec item' '
 	test_i18ngrep "Only one" actual
 '
 
-test_expect_success 'fail if attr magic is used places not implemented' '
+test_expect_success 'fail if attr magic is used in places not implemented' '
 	# The main purpose of this test is to check that we actually fail
 	# when you attempt to use attr magic in commands that do not implement
-	# attr magic. This test does not advocate git-add to stay that way,
-	# though, but git-add is convenient as it has its own internal pathspec
-	# parsing.
-	test_must_fail git add ":(attr:labelB)" 2>actual &&
+	# attr magic. This test does not advocate stash push to stay that way.
+	# When you teach the command to grok the pathspec, you need to find
+	# another commnad to replace it for the test.
+	test_must_fail git stash push ":(attr:labelB)" 2>actual &&
+	test_i18ngrep "magic not supported" actual
+'
+
+test_expect_success 'fail if attr magic is used in --pathspec-from-file when not implemented' '
+	# This is like the test above but for attr magic pass in via --pathspec-from-file.
+	cat <<-\EOF >pathspec_file &&
+	:(attr:labelB)
+	EOF
+	test_must_fail git stash push --pathspec-from-file=pathspec_file 2>actual &&
 	test_i18ngrep "magic not supported" actual
 '
 
+test_expect_success 'check that attr magic works for git add --all' '
+	cat <<-\EOF >expect &&
+	sub/newFileA-foo
+	EOF
+	>sub/newFileA-foo &&
+	>sub/newFileB-foo &&
+	git add --all ":(exclude,attr:labelB)" &&
+	git diff --name-only --cached >actual &&
+	git restore -W -S . &&
+	test_cmp expect actual
+'
+
+test_expect_success 'check that attr magic works for git add -u' '
+	cat <<-\EOF >expect &&
+	sub/fileA
+	EOF
+	>sub/newFileA-foo &&
+	>sub/newFileB-foo &&
+	>sub/fileA &&
+	>sub/fileB &&
+	git add -u ":(exclude,attr:labelB)" &&
+	git diff --name-only --cached  >actual &&
+	git restore -S -W . && rm sub/new* &&
+	test_cmp expect actual
+'
+
+test_expect_success 'check that attr magic works for git add <path>' '
+	cat <<-\EOF >expect &&
+	fileA
+	fileB
+	sub/fileA
+	EOF
+	>fileA &&
+	>fileB &&
+	>sub/fileA &&
+	>sub/fileB &&
+	git add ":(exclude,attr:labelB)sub/*" &&
+	git diff --name-only --cached >actual &&
+	git restore -S -W . &&
+	test_cmp expect actual
+'
+
+test_expect_success 'check that attr magic works for git -add .' '
+	cat <<-\EOF >expect &&
+	sub/fileA
+	EOF
+	>fileA &&
+	>fileB &&
+	>sub/fileA &&
+	>sub/fileB &&
+	cd sub &&
+	git add . ":(exclude,attr:labelB)" &&
+	cd .. &&
+	git diff --name-only --cached >actual &&
+	git restore -S -W . &&
+	test_cmp expect actual
+'
+
+test_expect_success 'check that attr magic works for git add --pathspec-from-file' '
+	cat <<-\EOF >pathspec_file &&
+	:(exclude,attr:labelB)
+	EOF
+	cat <<-\EOF >expect &&
+	sub/newFileA-foo
+	EOF
+	>sub/newFileA-foo &&
+	>sub/newFileB-foo &&
+	git add --all --pathspec-from-file=pathspec_file &&
+	git diff --name-only --cached >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'check that attr magic works for git' '
+'
+
 test_expect_success 'abort on giving invalid label on the command line' '
 	test_must_fail git ls-files . ":(attr:☺)"
 '
-- 
2.42.0.609.gbb76f46606-goog


^ permalink raw reply related

* Re:[PATCH 1/1] add: Enable attr pathspec magic for git-add.
From: Joanna Wang @ 2023-10-07  0:50 UTC (permalink / raw)
  To: jojwang; +Cc: git, gitster
In-Reply-To: <20231007002811.2337315-1-jojwang@google.com>

>+test_expect_success 'check that attr magic works for git' '
>+'
Sorry, i forgot to take this out. I can fix it in the next pathset after
the next round of review.



^ permalink raw reply

* Re: Microsoft Smart App Control - Git - git-bash.exe File Unsigned
From: brian m. carlson @ 2023-10-07  1:07 UTC (permalink / raw)
  To: Rolland Swing (Insight Global LLC); +Cc: git@vger.kernel.org, Anthony Chuang
In-Reply-To: <SJ1PR21MB369933C2C879EAD0D5EAFBD1E3CAA@SJ1PR21MB3699.namprd21.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 1785 bytes --]

On 2023-10-05 at 20:41:39, Rolland Swing (Insight Global LLC) wrote:
> Hi Git Team,

Hey,

> We're part of the Microsoft team that owns Smart App Control (https://learn.microsoft.com/en-us/windows/apps/develop/smart-app-control/overview), which requires applications to sign all of their executable files (exe, dll, msi, tmp, and a few other file formats).
>  
> We found during internal testing and/or from user feedback that your app, git-bash.exe, is not correctly signed. 
> 
> Block Event:   FileName: \Device\HarddiskVolume7\Program Files\Git\git-bash.exe
>   Calling Process: \Device\HarddiskVolume7\Windows\explorer.exe
>   Sha256 Hash: 42F2E685686FB6356A195709AF912C7B9D424466BD7C6D69258AADA5E80AC3C2 

The Git project doesn't distribute any binaries at all.  We distribute
only source code.  Many distributors compile these to produce binaries.

The project you are probably thinking of is Git for Windows, which,
while related, is a separate project.  They do indeed distribute
binaries, and this looks like a binary that's theirs.  If you'd like to
contact them, you can use their issue tracker
(https://github.com/git-for-windows/git/issues) to inquire.

However, I will note that a cursory search there found
https://github.com/git-for-windows/git/issues/798, where the maintainer
points out that there are over 400 exe files and 250 dll files, which
would make signing them all excessively burdensome.  I expect the
upcoming requirements for HSM-backed keys for Windows code signing may
make that even slower and more burdensome.  That being said, perhaps
with automation, the maintainer may feel differently than they did in
2016, so it might be worth asking again.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply

* Re: [PATCH v3 1/3] diff-merges: improve --diff-merges documentation
From: Elijah Newren @ 2023-10-07  1:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sergey Organov, git
In-Reply-To: <xmqq7cnzaav0.fsf@gitster.g>

On Fri, Oct 6, 2023 at 11:01 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> >> > +--cc::
> >> > +     Produce dense combined diff output for merge commits.
> >> > +     Shortcut for '--diff-merges=dense-combined -p'.
> >>
> >> Good.
> >>
> >> > +--remerge-diff::
> >> > +     Produce diff against re-merge.
> >> > +     Shortcut for '--diff-merges=remerge -p'.
> >> ...
> > Perhaps:
> >
> > Produce remerge-diff output for merge commits, in order to show how
> > conflicts were resolved.
>
> I do not mind it, but then I'd prefer to see ", in order to show
> how" also in the description of "--cc" and "-c" for consistency.

The problem is it's really hard for me to come up with an answer to
that, in part because...

> A succinct way to say what they do may be hard to come by, but I
> think of them showing places that did not have obvious natural
> resolution.

In my opinion, --remerge-diff does this better; wouldn't we want a
rationale where these particular modes shine?  Is that a non-empty
set?  (It may well be, but to me, --cc was never worse than -c while
often being better, and likewise, --remerge-diff is never worse than
--cc while often being better, at least on anything I had thought to
use any of these for.  Maybe there are other usecases for -c and --cc
I'm just not thinking of?)

^ permalink raw reply

* Re: [PATCH v3 1/3] diff-merges: improve --diff-merges documentation
From: Junio C Hamano @ 2023-10-07  1:50 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Sergey Organov, git
In-Reply-To: <CABPp-BGxVnhnmoajWyqY_gMvQ42W5S6VX5EOXq3PW=GLVQwe0g@mail.gmail.com>

Elijah Newren <newren@gmail.com> writes:

> In my opinion, --remerge-diff does this better; wouldn't we want a
> rationale where these particular modes shine?  Is that a non-empty
> set?  (It may well be, but to me, --cc was never worse than -c while
> often being better, and likewise, --remerge-diff is never worse than
> --cc while often being better, at least on anything I had thought to
> use any of these for.  Maybe there are other usecases for -c and --cc
> I'm just not thinking of?)

Between -c and --cc, I do not think there is anything that makes us
favor -c over --cc.  While the algorithm to decide which hunks out
of -c's output to omit was being polished, comparison with -c served
a good way to give baseline, but once --cc has become solid, I do
not think I've used -c myself.

I personally find that a very trivial merge resolution is far easier
to read with --cc than --remerge-diff, the latter being way too
verbose.

Also, --cc and -c should work inside a read-only repository where
you only have read access to.  If remerge needs to write some
objects to the repository, then you'd need some hack to give a
writable object store overlay via the alternate odb mechanism, or
something, right?


$ git show --oneline --cc -U1 9fde277c338
9fde277c33 Merge branch 'cc/git-replay' into seen

diff --cc Makefile
index cf60c16deb,05a504dc28..c581c1ddba
--- a/Makefile
+++ b/Makefile
@@@ -803,4 -801,2 +803,3 @@@ TEST_BUILTINS_OBJS += test-env-helper.
  TEST_BUILTINS_OBJS += test-example-decorate.o
- TEST_BUILTINS_OBJS += test-fast-rebase.o
 +TEST_BUILTINS_OBJS += test-find-pack.o
  TEST_BUILTINS_OBJS += test-fsmonitor-client.o
diff --cc t/helper/test-tool.c
index 9010ac6de7,9ca1586de7..77b1d7c15d
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@@ -32,4 -32,2 +32,3 @@@ static struct test_cmd cmds[] = 
  	{ "example-decorate", cmd__example_decorate },
- 	{ "fast-rebase", cmd__fast_rebase },
 +	{ "find-pack", cmd__find_pack },
  	{ "fsmonitor-client", cmd__fsmonitor_client },
diff --cc t/helper/test-tool.h
index f134f96b97,a03bbfc6b2..5deeca66fe
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@@ -26,4 -26,2 +26,3 @@@ int cmd__env_helper(int argc, const cha
  int cmd__example_decorate(int argc, const char **argv);
- int cmd__fast_rebase(int argc, const char **argv);
 +int cmd__find_pack(int argc, const char **argv);
  int cmd__fsmonitor_client(int argc, const char **argv);
$ git show --oneline --remerge-diff -U1 9fde277c338
9fde277c33 Merge branch 'cc/git-replay' into seen
diff --git a/Makefile b/Makefile
remerge CONFLICT (content): Merge conflict in Makefile
index 987c8e3569..c581c1ddba 100644
--- a/Makefile
+++ b/Makefile
@@ -803,9 +803,3 @@ TEST_BUILTINS_OBJS += test-env-helper.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
-<<<<<<< 0fd7a144c5 (Merge branch 'js/doc-unit-tests-with-cmake' into seen)
-TEST_BUILTINS_OBJS += test-fast-rebase.o
 TEST_BUILTINS_OBJS += test-find-pack.o
-||||||| 1fc548b2d6
-TEST_BUILTINS_OBJS += test-fast-rebase.o
-=======
->>>>>>> 0b853ad4db (replay: stop assuming replayed branches do not diverge)
 TEST_BUILTINS_OBJS += test-fsmonitor-client.o
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
remerge CONFLICT (content): Merge conflict in t/helper/test-tool.c
index 87a9794564..77b1d7c15d 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -32,9 +32,3 @@ static struct test_cmd cmds[] = {
 	{ "example-decorate", cmd__example_decorate },
-<<<<<<< 0fd7a144c5 (Merge branch 'js/doc-unit-tests-with-cmake' into seen)
-	{ "fast-rebase", cmd__fast_rebase },
 	{ "find-pack", cmd__find_pack },
-||||||| 1fc548b2d6
-	{ "fast-rebase", cmd__fast_rebase },
-=======
->>>>>>> 0b853ad4db (replay: stop assuming replayed branches do not diverge)
 	{ "fsmonitor-client", cmd__fsmonitor_client },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
remerge CONFLICT (content): Merge conflict in t/helper/test-tool.h
index e8abf4c42f..5deeca66fe 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -26,9 +26,3 @@ int cmd__env_helper(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
-<<<<<<< 0fd7a144c5 (Merge branch 'js/doc-unit-tests-with-cmake' into seen)
-int cmd__fast_rebase(int argc, const char **argv);
 int cmd__find_pack(int argc, const char **argv);
-||||||| 1fc548b2d6
-int cmd__fast_rebase(int argc, const char **argv);
-=======
->>>>>>> 0b853ad4db (replay: stop assuming replayed branches do not diverge)
 int cmd__fsmonitor_client(int argc, const char **argv);


^ permalink raw reply related

* Re: [PATCH 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
From: Eric Biederman @ 2023-10-07  3:07 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: Elijah Newren, Jeff King, Junio C Hamano
In-Reply-To: <cb0f79cabb7921ab7e334ad8a467ae84853bbd39.1696629697.git.me@ttaylorr.com>



On October 6, 2023 5:02:04 PM CDT, Taylor Blau <me@ttaylorr.com> wrote:
>
>Within `deflate_tree_to_pack_incore()`, the changes should be limited
>to something like:
>
>    if (the_repository->compat_hash_algo) {
>      struct strbuf converted = STRBUF_INIT;
>      if (convert_object_file(&compat_obj,
>                              the_repository->hash_algo,
>                              the_repository->compat_hash_algo, ...) < 0)
>        die(...);
>
>      format_object_header_hash(the_repository->compat_hash_algo,
>                                OBJ_TREE, size);
>
>      strbuf_release(&converted);
>    }
>
>, assuming related changes throughout the rest of the bulk-checkin
>machinery necessary to update the hash of the converted object, which
>are likewise minimal in size.

So this is close.   Just in case someone wants to
go down this path I want to point out that
the converted object need to have the compat hash computed over it.

Which means that the strbuf_release in your example comes a bit early.


Eric

^ permalink raw reply

* Re: [PATCH v3 1/3] diff-merges: improve --diff-merges documentation
From: Junio C Hamano @ 2023-10-07  6:49 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Sergey Organov, git
In-Reply-To: <xmqqjzrz5hgn.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> Elijah Newren <newren@gmail.com> writes:
>
>> In my opinion, --remerge-diff does this better; wouldn't we want a
>> ...
> I personally find that a very trivial merge resolution is far easier
> to read with --cc than --remerge-diff, the latter being way too
> verbose.
>
> Also, --cc and -c should work inside a read-only repository where
> you only have read access to.  If remerge needs to write some
> objects to the repository, then you'd need some hack to give a
> writable object store overlay via the alternate odb mechanism, or
> something, right?

Well, the above did not come out as well as I intended, as I forgot
to prefix it with something I thought was obvious from what I said
in the recent discussion in the earlier iteration of this topic,
where I said that it would be "--remerge-diff", if I were to pick an
option that is so useful that it deserves short and sweet single
letter.  Narutally, it came after we gained experience with "--cc",
so it would be surprising if it did worse.  Just like it is natural
to expect that "--cc" would give more useful output than "-m -p"
that predates everybody else.

In short, I would say "--remerge-diff" would give output that is the
easiest to grok among the three modern variants to show the changes
a merge introduces.

The above two cases, where I said cc does better than remerge-diff,
were meant as _exceptions_ for that general sentiment.

^ permalink raw reply

* Re: [OUTREACHY] Permission To Work On Tasks
From: Christian Couder @ 2023-10-07  7:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Naomi Ibe, git
In-Reply-To: <xmqqr0m75o0b.fsf@gitster.g>

On Sat, Oct 7, 2023 at 1:29 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Naomi Ibe <naomi.ibeh69@gmail.com> writes:
>
> > First issue is here https://github.com/gitgitgadget/git/issues/635 ,
> > involving changing the "die()" error msg outputs to all lowercase. I
> > found a few files here https://github.com/git/git/tree/master/builtin
> > where the "die()" error msg had some uppercase in them (add.c in lines
> > 185, 203, 205, 211 and 571) (branch.c in lines 521, 525, 581, 597,
> > 599, 627, 629, 643, 650, 652, 776, 926, 954 and 968). If I'm allowed
> > to work on this issue, how many files should I edit? The last closed
> > issues related to this issue had edited five files.
>
> As the "general microproject information" page says, it is a good
> idea to do just one quality-focused microproject per applicant.
>
> If I were on the receiving end to review such a patch, I would
> probably find it is too boring a burden if it had several unrelated
> commands covered by a single patch, and would stop reading in the
> middle.
>
> If I were on the sending end to work on them for real (not as "dip
> my toe in the water and say hello to the more experienced
> developers" exercise), I would probably prepare a series of patches,
> one for each git subcommand (e.g. "add", "branch", "log", etc.), and
> for shared infrastructure files, one for each subsystem that they
> are part of (this is harder to do for a new person who do not know
> what subsystems exist, and which files implement which subsystem),
> but for a microproject, I would say a single file under builtin/*
> hierarchy would be a good size.

Yeah, I agree. In my opinion, a single patch focused on a single file
like builtin/add.c or builtin/branch.c is the best.

> > Second issue is this https://github.com/gitgitgadget/git/issues/302 .
> > Is it still available to be worked on? I notice it was opened in 2019
>
> Stepping back a bit, do you agree with what the issue says?
> Remember, these "issue"s are merely one person's opinion and not
> endorsed by the community.
>
> Before you ask "is it still available", do you know the current
> status (not the status of the "issue")?  Have you looked at "git
> commit --help" to find it out yourself to see if "now" is singled
> out?  Here is what we say in our documentation:
>
>     In addition to recognizing all date formats above, the --date
>     option will also try to make sense of other, more human-centric
>     date formats, such as relative dates like "yesterday" or "last
>     Friday at noon".
>
> So apparently it is still "available".  It is a different matter how
> well a patch that adds "now" to the examples listed there will be
> accepted, though.  During a microproject, one of the things new
> contributors are expected to learn is to convince others the cause
> of their patches with the proposed commit log message well.

Yeah, I think this issue, if it is indeed an issue, is not something
easy to "fix" for a newcomer as it requires to be familiar with our
documentation and perhaps our code too, or to research them enough to
understand what a good improvement would be. So you could perhaps do
it, but it would likely require more work.

> Finally, you do not need to obtain permission to work on anything
> around here.  You work on what interests you, send the result (or
> send request for help, to which others may offer advices if the
> problem you are solving looks interesting) to be reviewed, and will
> be thanked for working on it when your patch is applied.  To avoid
> duplicated work, you might want to say "I'm interested in doing
> this, but is anybody already doing it?  If so I'll avoid stepping on
> their toes", but otherwise, you are expected to go wild on your own
> ;-)

I think it's a good idea to ask if the task would be too difficult,
too time consuming, too big or otherwise not appropriate for a
microproject. But true, we don't want applicants to ask for some kind
of permission. We prefer if they just ask for advice.

^ permalink raw reply

* What's cooking in git.git (Oct 2023, #03; Fri, 6)
From: Junio C Hamano @ 2023-10-07  8:20 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking in my tree.  Commits
prefixed with '+' are in 'next' (being in 'next' is a sign that a
topic is stable enough to be used and are candidate to be in a
future release).  Commits prefixed with '-' are only in 'seen', and
aren't considered "accepted" at all and may be annotated with an URL
to a message that raises issues but they are no means exhaustive.  A
topic without enough support may be discarded after a long period of
no activity (of course they can be resubmit when new interests
arise).

Copies of the source code to Git live in many repositories, and the
following is a list of the ones I push into or their mirrors.  Some
repositories have only a subset of branches.

With maint, master, next, seen, todo:

	git://git.kernel.org/pub/scm/git/git.git/
	git://repo.or.cz/alt-git.git/
	https://kernel.googlesource.com/pub/scm/git/git/
	https://github.com/git/git/
	https://gitlab.com/git-vcs/git/

With all the integration branches and topics broken out:

	https://github.com/gitster/git/

Even though the preformatted documentation in HTML and man format
are not sources, they are published in these repositories for
convenience (replace "htmldocs" with "manpages" for the manual
pages):

	git://git.kernel.org/pub/scm/git/git-htmldocs.git/
	https://github.com/gitster/git-htmldocs.git/

Release tarballs are available at:

	https://www.kernel.org/pub/software/scm/git/

--------------------------------------------------
[New Topics]

* jc/merge-ort-attr-index-fix (2023-10-05) 1 commit
 - merge-ort: initialize repo in index state

 Fix "git merge-tree" to stop segfaulting when the --attr-source
 option is used.

 Waiting for review response.
 source: <pull.1583.git.git.1696519349407.gitgitgadget@gmail.com>


* jk/decoration-and-other-leak-fixes (2023-10-05) 3 commits
  (merged to 'next' on 2023-10-06 at 5fc05c94dc)
 + daemon: free listen_addr before returning
 + revision: clear decoration structs during release_revisions()
 + decorate: add clear_decoration() function

 Leakfix.

 Will merge to 'master'.
 source: <20231005212802.GA982892@coredump.intra.peff.net>


* sn/typo-grammo-phraso-fixes (2023-10-05) 5 commits
 - t/README: fix multi-prerequisite example
 - doc/gitk: s/sticked/stuck/
 - git-jump: admit to passing merge mode args to ls-files
 - doc/diff-options: improve wording of the log.diffMerges mention
 - doc: fix some typos, grammar and wording issues

 Many typos, ungrammatical sentences and wrong phrasing have been
 fixed.

 Needs review.
 source: <20231003082107.3002173-1-stepnem@smrk.net>


* so/diff-merges-dd (2023-10-05) 3 commits
 - completion: complete '--dd'
 - diff-merges: introduce '--dd' option
 - diff-merges: improve --diff-merges documentation

 "git log" and friends learned "--dd" that is a short-hand for
 "--diff-merges=first-parent -p".

 Expecting a reroll.
 cf. <871qe7r3rk.fsf@osv.gnss.ru>
 source: <20231004214558.210339-1-sorganov@gmail.com>


* vd/loose-ref-iteration-optimization (2023-10-06) 4 commits
 - files-backend.c: avoid stat in 'loose_fill_ref_dir'
 - dir.[ch]: add 'follow_symlink' arg to 'get_dtype'
 - dir.[ch]: expose 'get_dtype'
 - ref-cache.c: fix prefix matching in ref iteration

 The code to iterate over loose references have been optimized to
 reduce the number of lstat() system calls.
 source: <pull.1594.git.1696615769.gitgitgadget@gmail.com>


* jc/update-list-references-to-lore (2023-10-06) 1 commit
 - doc: update list archive reference to use lore.kernel.org

 source: <xmqq7cnz741s.fsf@gitster.g>

--------------------------------------------------
[Stalled]

* cc/git-replay (2023-09-07) 15 commits
 - replay: stop assuming replayed branches do not diverge
 - replay: add --contained to rebase contained branches
 - replay: add --advance or 'cherry-pick' mode
 - replay: disallow revision specific options and pathspecs
 - replay: use standard revision ranges
 - replay: make it a minimal server side command
 - replay: remove HEAD related sanity check
 - replay: remove progress and info output
 - replay: add an important FIXME comment about gpg signing
 - replay: don't simplify history
 - replay: introduce pick_regular_commit()
 - replay: die() instead of failing assert()
 - replay: start using parse_options API
 - replay: introduce new builtin
 - t6429: remove switching aspects of fast-rebase

 Waiting for review response.
 cf. <52277471-4ddd-b2e0-62ca-c2a5b59ae418@gmx.de>
 cf. <58daa706-7efb-51dd-9061-202ef650b96a@gmx.de>
 cf. <f0e75d47-c277-9fbb-7bcd-53e4e5686f3c@gmx.de>
 source: <20230907092521.733746-1-christian.couder@gmail.com>


* pw/rebase-sigint (2023-09-07) 1 commit
 - rebase -i: ignore signals when forking subprocesses

 If the commit log editor or other external programs (spawned via
 "exec" insn in the todo list) receive internactive signal during
 "git rebase -i", it caused not just the spawned program but the
 "Git" process that spawned them, which is often not what the end
 user intended.  "git" learned to ignore SIGINT and SIGQUIT while
 waiting for these subprocesses.

 Expecting a reroll.
 cf. <12c956ea-330d-4441-937f-7885ab519e26@gmail.com>
 source: <pull.1581.git.1694080982621.gitgitgadget@gmail.com>


* tk/cherry-pick-sequence-requires-clean-worktree (2023-06-01) 1 commit
 - cherry-pick: refuse cherry-pick sequence if index is dirty

 "git cherry-pick A" that replays a single commit stopped before
 clobbering local modification, but "git cherry-pick A..B" did not,
 which has been corrected.

 Expecting a reroll.
 cf. <999f12b2-38d6-f446-e763-4985116ad37d@gmail.com>
 source: <pull.1535.v2.git.1685264889088.gitgitgadget@gmail.com>


* jc/diff-cached-fsmonitor-fix (2023-09-15) 3 commits
 - diff-lib: fix check_removed() when fsmonitor is active
 - Merge branch 'jc/fake-lstat' into jc/diff-cached-fsmonitor-fix
 - Merge branch 'js/diff-cached-fsmonitor-fix' into jc/diff-cached-fsmonitor-fix
 (this branch uses jc/fake-lstat.)

 The optimization based on fsmonitor in the "diff --cached"
 codepath is resurrected with the "fake-lstat" introduced earlier.

 It is unknown if the optimization is worth resurrecting, but in case...
 source: <xmqqr0n0h0tw.fsf@gitster.g>

--------------------------------------------------
[Cooking]

* jk/commit-graph-leak-fixes (2023-10-03) 10 commits
  (merged to 'next' on 2023-10-06 at 5d202ef8b9)
 + commit-graph: clear oidset after finishing write
 + commit-graph: free write-context base_graph_name during cleanup
 + commit-graph: free write-context entries before overwriting
 + commit-graph: free graph struct that was not added to chain
 + commit-graph: delay base_graph assignment in add_graph_to_chain()
 + commit-graph: free all elements of graph chain
 + commit-graph: move slab-clearing to close_commit_graph()
 + merge: free result of repo_get_merge_bases()
 + commit-reach: free temporary list in get_octopus_merge_bases()
 + t6700: mark test as leak-free

 Leakfix.

 Will merge to 'master'.
 source: <20231003202504.GA7697@coredump.intra.peff.net>


* jm/git-status-submodule-states-docfix (2023-10-04) 1 commit
  (merged to 'next' on 2023-10-04 at 520b7711a4)
 + git-status.txt: fix minor asciidoc format issue

 Docfix.

 Will merge to 'master'.
 source: <pull.1591.v3.git.1696386165616.gitgitgadget@gmail.com>


* rs/parse-opt-ctx-cleanup (2023-10-03) 1 commit
  (merged to 'next' on 2023-10-04 at d5d0a2ce3b)
 + parse-options: drop unused parse_opt_ctx_t member

 Code clean-up.

 Will merge to 'master'.
 source: <ebcaa9e1-d306-4c93-adec-3f35d7040531@web.de>


* tb/repack-max-cruft-size (2023-10-05) 4 commits
  (merged to 'next' on 2023-10-06 at b3ca6df3b9)
 + builtin/repack.c: avoid making cruft packs preferred
 + builtin/repack.c: implement support for `--max-cruft-size`
 + builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE
 + t7700: split cruft-related tests to t7704

 "git repack" learned "--max-cruft-size" to prevent cruft packs from
 growing without bounds.

 Will merge to 'master'.
 source: <cover.1696293862.git.me@ttaylorr.com>
 source: <035393935108d02aaf8927189b05102f4f74f340.1696370003.git.me@ttaylorr.com>


* ak/color-decorate-symbols (2023-10-03) 1 commit
 - decorate: add color.decorate.symbols config option

 A new config for coloring.

 Needs review.
 source: <20231003205442.22963-1-andy.koppe@gmail.com>


* jc/attr-tree-config (2023-10-04) 2 commits
 - attr: add attr.allowInvalidSource config to allow invalid revision
 - attr: add attr.tree for setting the treeish to read attributes from

 The attribute subsystem learned to honor `attr.tree` configuration
 that specifies which tree to read the .gitattributes files from.

 Needs review.
 source: <pull.1577.v2.git.git.1696443502.gitgitgadget@gmail.com>


* js/submodule-fix-misuse-of-path-and-name (2023-10-03) 6 commits
  (merged to 'next' on 2023-10-06 at 1054b6e752)
 + t7420: test that we correctly handle renamed submodules
 + t7419: test that we correctly handle renamed submodules
 + t7419, t7420: use test_cmp_config instead of grepping .gitmodules
 + t7419: actually test the branch switching
 + submodule--helper: return error from set-url when modifying failed
 + submodule--helper: use submodule_from_path in set-{url,branch}

 In .gitmodules files, submodules are keyed by their names, and the
 path to the submodule whose name is $name is specified by the
 submodule.$name.path variable.  There were a few codepaths that
 mixed the name and path up when consulting the submodule database,
 which have been corrected.  It took long for these bugs to be found
 as the name of a submodule initially is the same as its path, and
 the problem does not surface until it is moved to a different path,
 which apparently happens very rarely.

 Will merge to 'master'.
 source: <0a0a157f88321d25fdb0be771a454b3410a449f3.camel@archlinux.org>


* cw/prelim-cleanup (2023-09-29) 4 commits
  (merged to 'next' on 2023-10-03 at 5985929612)
 + parse: separate out parsing functions from config.h
 + config: correct bad boolean env value error message
 + wrapper: reduce scope of remove_or_warn()
 + hex-ll: separate out non-hash-algo functions

 Shuffle some bits across headers and sources to prepare for
 libification effort.

 Will merge to 'master'.
 source: <cover.1696021277.git.jonathantanmy@google.com>


* ds/init-diffstat-width (2023-09-29) 1 commit
  (merged to 'next' on 2023-10-03 at 18383ac895)
 + diff --stat: set the width defaults in a helper function

 Code clean-up.

 Will merge to 'master'.
 source: <d45d1dac1a20699e370905b88b6fd0ec296751e7.1695441501.git.dsimic@manjaro.org>


* ar/diff-index-merge-base-fix (2023-10-02) 1 commit
  (merged to 'next' on 2023-10-06 at 0ff4dfc0e1)
 + diff: fix --merge-base with annotated tags

 "git diff --merge-base X other args..." insisted that X must be a
 commit and errored out when given an annotated tag that peels to a
 commit, but we only need it to be a committish.  This has been
 corrected.

 Will merge to 'master'.
 source: <20231001151845.3621551-1-hi@alyssa.is>


* eb/limit-bulk-checkin-to-blobs (2023-09-26) 1 commit
  (merged to 'next' on 2023-10-02 at 89c9c95966)
 + bulk-checkin: only support blobs in index_bulk_checkin

 The "streaming" interface used for bulk-checkin codepath has been
 narrowed to take only blob objects for now, with no real loss of
 functionality.

 Will merge to 'master'.
 source: <87msx99b9o.fsf_-_@gmail.froward.int.ebiederm.org>


* js/update-urls-in-doc-and-comment (2023-09-26) 4 commits
 - doc: refer to internet archive
 - doc: update links for andre-simon.de
 - doc: update links to current pages
 - doc: switch links to https

 Stale URLs have been updated to their current counterparts (or
 archive.org) and HTTP links are replaced with working HTTPS links.

 Needs review.
 source: <pull.1589.v2.git.1695553041.gitgitgadget@gmail.com>


* la/trailer-cleanups (2023-09-26) 4 commits
 - trailer: only use trailer_block_* variables if trailers were found
 - trailer: use offsets for trailer_start/trailer_end
 - trailer: find the end of the log message
 - commit: ignore_non_trailer computes number of bytes to ignore

 Code clean-up.

 Needs review.
 source: <pull.1563.v4.git.1695709372.gitgitgadget@gmail.com>


* eb/hash-transition (2023-10-02) 30 commits
 - t1016-compatObjectFormat: add tests to verify the conversion between objects
 - t1006: test oid compatibility with cat-file
 - t1006: rename sha1 to oid
 - test-lib: compute the compatibility hash so tests may use it
 - builtin/ls-tree: let the oid determine the output algorithm
 - object-file: handle compat objects in check_object_signature
 - tree-walk: init_tree_desc take an oid to get the hash algorithm
 - builtin/cat-file: let the oid determine the output algorithm
 - rev-parse: add an --output-object-format parameter
 - repository: implement extensions.compatObjectFormat
 - object-file: update object_info_extended to reencode objects
 - object-file-convert: convert commits that embed signed tags
 - object-file-convert: convert commit objects when writing
 - object-file-convert: don't leak when converting tag objects
 - object-file-convert: convert tag objects when writing
 - object-file-convert: add a function to convert trees between algorithms
 - object: factor out parse_mode out of fast-import and tree-walk into in object.h
 - cache: add a function to read an OID of a specific algorithm
 - tag: sign both hashes
 - commit: export add_header_signature to support handling signatures on tags
 - commit: convert mergetag before computing the signature of a commit
 - commit: write commits for both hashes
 - object-file: add a compat_oid_in parameter to write_object_file_flags
 - object-file: update the loose object map when writing loose objects
 - loose: compatibilty short name support
 - loose: add a mapping between SHA-1 and SHA-256 for loose objects
 - repository: add a compatibility hash algorithm
 - object-names: support input of oids in any supported hash
 - oid-array: teach oid-array to handle multiple kinds of oids
 - object-file-convert: stubs for converting from one object format to another

 Teach a repository to work with both SHA-1 and SHA-256 hash algorithms.

 Needs review.
 source: <878r8l929e.fsf@gmail.froward.int.ebiederm.org>


* jx/remote-archive-over-smart-http (2023-10-04) 4 commits
 - archive: support remote archive from stateless transport
 - transport-helper: call do_take_over() in connect_helper
 - transport-helper: call do_take_over() in process_connect
 - transport-helper: no connection restriction in connect_helper

 "git archive --remote=<remote>" learned to talk over the smart
 http (aka stateless) transport.

 Needs review.
 source: <cover.1696432593.git.zhiyou.jx@alibaba-inc.com>


* jx/sideband-chomp-newline-fix (2023-10-04) 3 commits
 - pkt-line: do not chomp newlines for sideband messages
 - pkt-line: memorize sideband fragment in reader
 - test-pkt-line: add option parser for unpack-sideband

 Sideband demultiplexer fixes.

 Needs review.
 source: <cover.1696425168.git.zhiyou.jx@alibaba-inc.com>


* ty/merge-tree-strategy-options (2023-09-25) 1 commit
  (merged to 'next' on 2023-09-29 at aa65b54416)
 + merge-tree: add -X strategy option

 "git merge-tree" learned to take strategy backend specific options
 via the "-X" option, like "git merge" does.

 Will merge to 'master'.
 source: <pull.1565.v6.git.1695522222723.gitgitgadget@gmail.com>


* js/ci-coverity (2023-10-05) 6 commits
  (merged to 'next' on 2023-10-05 at 253788f0d1)
 + coverity: detect and report when the token or project is incorrect
 + coverity: allow running on macOS
 + coverity: support building on Windows
 + coverity: allow overriding the Coverity project
 + coverity: cache the Coverity Build Tool
 + ci: add a GitHub workflow to submit Coverity scans

 GitHub CI workflow has learned to trigger Coverity check.

 Will merge to 'master'.
 source: <pull.1588.v2.git.1695642662.gitgitgadget@gmail.com>


* js/config-parse (2023-09-21) 5 commits
 - config-parse: split library out of config.[c|h]
 - config.c: accept config_parse_options in git_config_from_stdin
 - config: report config parse errors using cb
 - config: split do_event() into start and flush operations
 - config: split out config_parse_options

 The parsing routines for the configuration files have been split
 into a separate file.

 Needs review.
 source: <cover.1695330852.git.steadmon@google.com>


* jc/fake-lstat (2023-09-15) 1 commit
 - cache: add fake_lstat()
 (this branch is used by jc/diff-cached-fsmonitor-fix.)

 A new helper to let us pretend that we called lstat() when we know
 our cache_entry is up-to-date via fsmonitor.

 Needs review.
 source: <xmqqcyykig1l.fsf@gitster.g>


* kn/rev-list-ignore-missing-links (2023-09-20) 1 commit
 - revision: add `--ignore-missing-links` user option

 Surface the .ignore_missing_links bit that stops the revision
 traversal from stopping and dying when encountering a missing
 object to a new command line option of "git rev-list", so that the
 objects that are required but are missing can be enumerated.

 What's the status of this thing?
 source: <20230920104507.21664-1-karthik.188@gmail.com>


* rs/parse-options-value-int (2023-09-18) 2 commits
 - parse-options: use and require int pointer for OPT_CMDMODE
 - parse-options: add int value pointer to struct option

 A bit of type safety for the "value" pointer used in the
 parse-options API.

 What's the status of this thing?
 source: <e6d8a291-03de-cfd3-3813-747fc2cad145@web.de>


* cc/repack-sift-filtered-objects-to-separate-pack (2023-10-02) 9 commits
  (merged to 'next' on 2023-10-03 at e5a4824609)
 + gc: add `gc.repackFilterTo` config option
 + repack: implement `--filter-to` for storing filtered out objects
 + gc: add `gc.repackFilter` config option
 + repack: add `--filter=<filter-spec>` option
 + pack-bitmap-write: rebuild using new bitmap when remapping
 + repack: refactor finding pack prefix
 + repack: refactor finishing pack-objects command
 + t/helper: add 'find-pack' test-tool
 + pack-objects: allow `--filter` without `--stdout`

 "git repack" machinery learns to pay attention to the "--filter="
 option.

 Will merge to 'master'.
 cf. <ZRsknb4NxNHTR21E@nand.local>
 source: <20231002165504.1325153-1-christian.couder@gmail.com>


* la/trailer-test-and-doc-updates (2023-09-07) 13 commits
  (merged to 'next' on 2023-10-06 at 69fef35819)
 + trailer doc: <token> is a <key> or <keyAlias>, not both
 + trailer doc: separator within key suppresses default separator
 + trailer doc: emphasize the effect of configuration variables
 + trailer --unfold help: prefer "reformat" over "join"
 + trailer --parse docs: add explanation for its usefulness
 + trailer --only-input: prefer "configuration variables" over "rules"
 + trailer --parse help: expose aliased options
 + trailer --no-divider help: describe usual "---" meaning
 + trailer: trailer location is a place, not an action
 + trailer doc: narrow down scope of --where and related flags
 + trailer: add tests to check defaulting behavior with --no-* flags
 + trailer test description: this tests --where=after, not --where=before
 + trailer tests: make test cases self-contained

 Test coverage for trailers has been improved.

 Will merge to 'master'.
 source: <pull.1564.v3.git.1694125209.gitgitgadget@gmail.com>


* js/doc-unit-tests (2023-08-17) 3 commits
 - ci: run unit tests in CI
 - unit tests: add TAP unit test framework
 - unit tests: Add a project plan document
 (this branch is used by js/doc-unit-tests-with-cmake.)

 Process to add some form of low-level unit tests has started.

 Expecting a reroll.
 cf. <0b6de919-8dbf-454f-807b-5abb64388cb7@gmail.com>
 source: <cover.1692297001.git.steadmon@google.com>


* js/doc-unit-tests-with-cmake (2023-09-25) 7 commits
 - cmake: handle also unit tests
 - cmake: use test names instead of full paths
 - cmake: fix typo in variable name
 - artifacts-tar: when including `.dll` files, don't forget the unit-tests
 - unit-tests: do show relative file paths
 - unit-tests: do not mistake `.pdb` files for being executable
 - cmake: also build unit tests
 (this branch uses js/doc-unit-tests.)

 Update the base topic to work with CMake builds.

 Expecting a reroll.
 cf. <ZSCRFNkzXZb3fBaU@google.com>
 source: <pull.1579.v3.git.1695640836.gitgitgadget@gmail.com>


* tb/path-filter-fix (2023-08-30) 15 commits
 - bloom: introduce `deinit_bloom_filters()`
 - commit-graph: reuse existing Bloom filters where possible
 - object.h: fix mis-aligned flag bits table
 - commit-graph: drop unnecessary `graph_read_bloom_data_context`
 - commit-graph.c: unconditionally load Bloom filters
 - t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
 - bloom: prepare to discard incompatible Bloom filters
 - bloom: annotate filters with hash version
 - commit-graph: new filter ver. that fixes murmur3
 - repo-settings: introduce commitgraph.changedPathsVersion
 - t4216: test changed path filters with high bit paths
 - t/helper/test-read-graph: implement `bloom-filters` mode
 - bloom.h: make `load_bloom_filter_from_graph()` public
 - t/helper/test-read-graph.c: extract `dump_graph_info()`
 - gitformat-commit-graph: describe version 2 of BDAT

 The Bloom filter used for path limited history traversal was broken
 on systems whose "char" is unsigned; update the implementation and
 bump the format version to 2.

 What's the status of this thing?
 cf. <20230830200218.GA5147@szeder.dev>
 cf. <20230901205616.3572722-1-jonathantanmy@google.com>
 cf. <20230924195900.GA1156862@szeder.dev>
 source: <cover.1693413637.git.jonathantanmy@google.com>


* jc/rerere-cleanup (2023-08-25) 4 commits
 - rerere: modernize use of empty strbuf
 - rerere: try_merge() should use LL_MERGE_ERROR when it means an error
 - rerere: fix comment on handle_file() helper
 - rerere: simplify check_one_conflict() helper function

 Code clean-up.

 Not ready to be reviewed yet.
 source: <20230824205456.1231371-1-gitster@pobox.com>


* rj/status-bisect-while-rebase (2023-08-01) 1 commit
 - status: fix branch shown when not only bisecting

 "git status" is taught to show both the branch being bisected and
 being rebased when both are in effect at the same time.

 Needs review.
 cf. <xmqqtttia3vn.fsf@gitster.g>
 source: <48745298-f12b-8efb-4e48-90d2c22a8349@gmail.com>

--------------------------------------------------
[Discarded]

* tb/ci-coverity (2023-09-21) 1 commit
 . .github/workflows: add coverity action

 GitHub CI workflow has learned to trigger Coverity check.

 Superseded by the js/ci-coverity topic.
 source: <b23951c569660e1891a7fb3ad2c2ea1952897bd7.1695332105.git.me@ttaylorr.com>


* cw/git-std-lib (2023-09-11) 7 commits
 . SQUASH???
 . git-std-lib: add test file to call git-std-lib.a functions
 . git-std-lib: introduce git standard library
 . parse: create new library for parsing strings and env values
 . config: correct bad boolean env value error message
 . wrapper: remove dependency to Git-specific internal file
 . hex-ll: split out functionality from hex

 Another libification effort.

 Superseded by the cw/prelim-cleanup topic.
 cf. <xmqqy1hfrk6p.fsf@gitster.g>
 cf. <20230915183927.1597414-1-jonathantanmy@google.com>
 source: <20230908174134.1026823-1-calvinwan@google.com>


* so/diff-merges-d (2023-09-11) 2 commits
 - diff-merges: introduce '-d' option
 - diff-merges: improve --diff-merges documentation

 Superseded by the so/diff-merges-dd topic.
 source: <20230909125446.142715-1-sorganov@gmail.com>

^ permalink raw reply

* [Outreachy] Good first issue/micro project
From: Isoken Ibizugbe @ 2023-10-07  9:04 UTC (permalink / raw)
  To: git

Good day,
i am interested in working on this issue
https://github.com/gitgitgadget/git/issues/1555 as a micro project is
it worth doing and appropriate for a micro project.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox