Git development

Git development
 help / color / mirror / Atom feed

* [PATCH 2/3] t6022: fix 'even though' typo in comment
From: Christian Couder @ 2024-02-01 11:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, John Cai, Christian Couder,
	Christian Couder
In-Reply-To: <20240201115809.1177064-1-christian.couder@gmail.com>

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t6022-rev-list-missing.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh
index 211672759a..527aa94f07 100755
--- a/t/t6022-rev-list-missing.sh
+++ b/t/t6022-rev-list-missing.sh
@@ -46,7 +46,7 @@ do
 			git rev-list --objects --no-object-names \
 				HEAD ^$obj >expect.raw &&
 
-			# Blobs are shared by all commits, so evethough a commit/tree
+			# Blobs are shared by all commits, so even though a commit/tree
 			# might be skipped, its blob must be accounted for.
 			if [ $obj != "HEAD:1.t" ]; then
 				echo $(git rev-parse HEAD:1.t) >>expect.raw &&
-- 
2.43.0.496.gd667eb0d7d.dirty


^ permalink raw reply related

* [PATCH 3/3] rev-list: add --allow-missing-tips to be used with --missing=...
From: Christian Couder @ 2024-02-01 11:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, John Cai, Christian Couder,
	Christian Couder
In-Reply-To: <20240201115809.1177064-1-christian.couder@gmail.com>

In 9830926c7d (rev-list: add commit object support in `--missing`
option, 2023-10-27) we fixed the `--missing` option in `git rev-list`
so that it now works with commits too.

Unfortunately, such a command would still fail with a "fatal: bad
object <oid>" if it is passed a missing commit, blob or tree as an
argument.

When such a command is used to find the dependencies of some objects,
for example the dependencies of quarantined objects, it would be
better if the command would instead consider such missing objects,
especially commits, in the same way as other missing objects.

If, for example `--missing=print` is used, it would be nice for some
use cases if the missing tips passed as arguments were reported in
the same way as other missing objects instead of the command just
failing.

Let's introduce a new `--allow-missing-tips` option to make it work
like this.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/rev-list.c          | 24 ++++++++++++++++-
 revision.c                  |  9 ++++---
 revision.h                  |  8 ++++++
 t/t6022-rev-list-missing.sh | 51 +++++++++++++++++++++++++++++++++++++
 4 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index b3f4783858..ae7bb15478 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -562,6 +562,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 				break;
 		}
 	}
+	for (i = 1; i < argc; i++) {
+		const char *arg = argv[i];
+		if (!strcmp(arg, "--allow-missing-tips")) {
+			if (arg_missing_action == MA_ERROR)
+				die(_("option '%s' only makes sense with '%s' set to '%s' or '%s'"),
+				      "--allow-missing-tips", "--missing=", "allow-*", "print");
+			revs.do_not_die_on_missing_tips = 1;
+			break;
+		}
+	}
 
 	if (arg_missing_action)
 		revs.do_not_die_on_missing_objects = 1;
@@ -627,6 +637,8 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			continue; /* already handled above */
 		if (skip_prefix(arg, "--missing=", &arg))
 			continue; /* already handled above */
+		if (!strcmp(arg, "--allow-missing-tips"))
+			continue; /* already handled above */
 
 		if (!strcmp(arg, ("--no-object-names"))) {
 			arg_show_object_names = 0;
@@ -753,9 +765,19 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 
 	if (arg_print_omitted)
 		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
-	if (arg_missing_action == MA_PRINT)
+	if (arg_missing_action == MA_PRINT) {
+		struct oidset_iter iter;
+		struct object_id *oid;
+
 		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
 
+		/* Already add missing commits */
+		oidset_iter_init(&revs.missing_commits, &iter);
+		while ((oid = oidset_iter_next(&iter)))
+			oidset_insert(&missing_objects, oid);
+		oidset_clear(&revs.missing_commits);
+	}
+
 	traverse_commit_list_filtered(
 		&revs, show_commit, show_object, &info,
 		(arg_print_omitted ? &omitted_objects : NULL));
diff --git a/revision.c b/revision.c
index 4c5cd7c3ce..9f25faa249 100644
--- a/revision.c
+++ b/revision.c
@@ -388,6 +388,10 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
 			return NULL;
 		if (revs->exclude_promisor_objects && is_promisor_object(oid))
 			return NULL;
+		if (revs->do_not_die_on_missing_tips) {
+			oidset_insert(&revs->missing_commits, oid);
+			return NULL;
+		}
 		die("bad object %s", name);
 	}
 	object->flags |= flags;
@@ -1947,6 +1951,7 @@ void repo_init_revisions(struct repository *r,
 	init_display_notes(&revs->notes_opt);
 	list_objects_filter_init(&revs->filter);
 	init_ref_exclusions(&revs->ref_excludes);
+	oidset_init(&revs->missing_commits, 0);
 }
 
 static void add_pending_commit_list(struct rev_info *revs,
@@ -2184,7 +2189,7 @@ static int handle_revision_arg_1(const char *arg_, struct rev_info *revs, int fl
 		verify_non_filename(revs->prefix, arg);
 	object = get_reference(revs, arg, &oid, flags ^ local_flags);
 	if (!object)
-		return revs->ignore_missing ? 0 : -1;
+		return (revs->ignore_missing || revs->do_not_die_on_missing_tips) ? 0 : -1;
 	add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
 	add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
 	free(oc.path);
@@ -3830,8 +3835,6 @@ int prepare_revision_walk(struct rev_info *revs)
 				       FOR_EACH_OBJECT_PROMISOR_ONLY);
 	}
 
-	oidset_init(&revs->missing_commits, 0);
-
 	if (!revs->reflog_info)
 		prepare_to_use_bloom_filter(revs);
 	if (!revs->unsorted_input)
diff --git a/revision.h b/revision.h
index 94c43138bc..67435a5d8a 100644
--- a/revision.h
+++ b/revision.h
@@ -227,6 +227,14 @@ struct rev_info {
 			 */
 			do_not_die_on_missing_objects:1,
 
+			/*
+			 * When the do_not_die_on_missing_objects flag above is set,
+			 * a rev walk could still die with "fatal: bad object <oid>"
+			 * if one of the tips it is passed is missing. With this flag
+			 * such a tip will be reported as missing too.
+			 */
+			 do_not_die_on_missing_tips:1,
+
 			/* for internal use only */
 			exclude_promisor_objects:1;
 
diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh
index 527aa94f07..283e8fc2c2 100755
--- a/t/t6022-rev-list-missing.sh
+++ b/t/t6022-rev-list-missing.sh
@@ -77,4 +77,55 @@ do
 	done
 done
 
+for obj in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
+do
+	for tip in "" "HEAD"
+	do
+		for action in "allow-any" "print"
+		do
+			test_expect_success "--missing=$action --allow-missing-tips with tip '$obj' missing and tip '$tip'" '
+				oid="$(git rev-parse $obj)" &&
+				path=".git/objects/$(test_oid_to_path $oid)" &&
+
+				# Before the object is made missing, we use rev-list to
+				# get the expected oids.
+				if [ "$tip" = "HEAD" ]; then
+					git rev-list --objects --no-object-names \
+						HEAD ^$obj >expect.raw
+				else
+					>expect.raw
+				fi &&
+
+				# Blobs are shared by all commits, so even though a commit/tree
+				# might be skipped, its blob must be accounted for.
+				if [ "$tip" = "HEAD" ] && [ $obj != "HEAD:1.t" ]; then
+					echo $(git rev-parse HEAD:1.t) >>expect.raw &&
+					echo $(git rev-parse HEAD:2.t) >>expect.raw
+				fi &&
+
+				mv "$path" "$path.hidden" &&
+				test_when_finished "mv $path.hidden $path" &&
+
+				git rev-list --missing=$action --allow-missing-tips \
+				     --objects --no-object-names $oid $tip >actual.raw &&
+
+				# When the action is to print, we should also add the missing
+				# oid to the expect list.
+				case $action in
+				allow-any)
+					;;
+				print)
+					grep ?$oid actual.raw &&
+					echo ?$oid >>expect.raw
+					;;
+				esac &&
+
+				sort actual.raw >actual &&
+				sort expect.raw >expect &&
+				test_cmp expect actual
+			'
+		done
+	done
+done
+
 test_done
-- 
2.43.0.496.gd667eb0d7d.dirty


^ permalink raw reply related

* Migrate away from vger to GitHub or (on-premise) GitLab?
From: Hans Meiser @ 2024-02-01 12:10 UTC (permalink / raw)
  To: git@vger.kernel.org
In-Reply-To: <AS2P195MB21350F44B079009C05A1EAF1E2432@AS2P195MB2135.EURP195.PROD.OUTLOOK.COM>

Hi,

is there any current discussion about moving Git development away from using a mailing list to some modern form of collaboration?

I'd like to be able to follow a structured discussion in issues and to contribute to the Git documentation, but the mailing list currently just bloats my personal inbox with loads of uninteresting e-mails in an unstructured waterfall of messy discussion that I am not able to follow professionally.

Are you consideration for migrating?

Regards,
Axel Dahmen

^ permalink raw reply

* Re: Migrate away from vger to GitHub or (on-premise) GitLab?
From: Kristoffer Haugsbakk @ 2024-02-01 12:21 UTC (permalink / raw)
  To: Hans Meiser; +Cc: git@vger.kernel.org
In-Reply-To: <AS2P195MB2135D91EE464FF30EE84E77EE2432@AS2P195MB2135.EURP195.PROD.OUTLOOK.COM>

Hi

On Thu, Feb 1, 2024, at 13:10, Hans Meiser wrote:
> Hi,
>
> Regards,
> Axel Dahmen

A relevant discussion seems to be “Improving new contrib onboarding”[1]

There’s GitGitGadget for people who want to use GitHub as a bridge[2]

There’s an unofficial issue tracker for project ideas (not for bugs)[3]

That’s what I know.

🔗 1: https://lore.kernel.org/git/ZRrgMDacYpj41DcO@nand.local/
🔗 2: https://gitgitgadget.github.io/
🔗 3: https://github.com/gitgitgadget/git/issues

-- 
Kristoffer Haugsbakk

^ permalink raw reply

* Re: Migrate away from vger to GitHub or (on-premise) GitLab?
From: Antonin Delpeuch @ 2024-02-01 12:20 UTC (permalink / raw)
  To: Hans Meiser, git@vger.kernel.org
In-Reply-To: <AS2P195MB2135D91EE464FF30EE84E77EE2432@AS2P195MB2135.EURP195.PROD.OUTLOOK.COM>

Hi Hans,

As a new contributor I have also been wondering about that and I found
the notes of the 2023 contributor summit very interesting in this regard:

https://docs.google.com/document/d/1GKoYtVhpdr_N2BAonYsxVTpPToP1CgCS9um0K7Gx9gQ/edit#heading=h.bdw77tvsksnr

There is a section on "Project management practices" which touches on
this topic, with the idea of using a bug tracker being raised for
instance. So you are not the only one thinking about it at least.

For what it's worth, I have written up a small report about my
contribution experience (which covers project management practices):

https://antonin.delpeuch.eu/posts/contribution-experience-report-git/

Best,

Antonin

On 01/02/2024 13:10, Hans Meiser wrote:
> Hi,
>
> is there any current discussion about moving Git development away from using a mailing list to some modern form of collaboration?
>
> I'd like to be able to follow a structured discussion in issues and to contribute to the Git documentation, but the mailing list currently just bloats my personal inbox with loads of uninteresting e-mails in an unstructured waterfall of messy discussion that I am not able to follow professionally.
>
> Are you consideration for migrating?
>
> Regards,
> Axel Dahmen

^ permalink raw reply

* Re: Migrate away from vger to GitHub or (on-premise) GitLab?
From: Dragan Simic @ 2024-02-01 12:56 UTC (permalink / raw)
  To: Hans Meiser; +Cc: git
In-Reply-To: <AS2P195MB2135D91EE464FF30EE84E77EE2432@AS2P195MB2135.EURP195.PROD.OUTLOOK.COM>

Hello,

On 2024-02-01 13:10, Hans Meiser wrote:
> is there any current discussion about moving Git development away from
> using a mailing list to some modern form of collaboration?
> 
> I'd like to be able to follow a structured discussion in issues and to
> contribute to the Git documentation, but the mailing list currently
> just bloats my personal inbox with loads of uninteresting e-mails in
> an unstructured waterfall of messy discussion that I am not able to
> follow professionally.
> 
> Are you consideration for migrating?

Perhaps it would be good to also know that many people simply don't
live in a web browser, so to speak, and live in the CLI instead.  For
such people, having to use a web browser for development is simply,
well, awkward and inefficient.

In other words, as much as not using some more modern, web-based
tools may drive some people away, there's also exactly the opposite
reaction that should also be considered.

There was recently some similar discussion for another open-source
project, but I simply can't find it now. :/

^ permalink raw reply

* Re: [PATCH 2/4] docs: Clean up `--empty` formatting in `git-rebase` and `git-am`
From: Phillip Wood @ 2024-02-01 14:02 UTC (permalink / raw)
  To: Brian Lyles, phillip.wood; +Cc: git, me, newren
In-Reply-To: <CAHPHrSdOVoBPR9vJou_Bxmq=4QW_z6nhnzxfmZ1Am0i-GJuz4g@mail.gmail.com>

Hi Brian

On 27/01/2024 21:22, Brian Lyles wrote:
> On Tue, Jan 23, 2024 at 8:24 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>> On 19/01/2024 05:59, brianmlyles@gmail.com wrote:
>>> From: Brian Lyles <brianmlyles@gmail.com>
>>>
>>> Both of these pages document very similar `--empty` options, but with
>>> different styles. This commit aims to make them more consistent.
>>
>> I think that's reasonable though the options they are worded as doing
>> different things. For "am" it talks about the patch being empty - i.e. a
>> patch of an empty commit whereas for "rebase" the option applies to
>> non-empty commits that become empty. What does "am" do if you try to
>> apply a patch whose changes are already present?
> 
> Hm -- as you mention, this does appear to have a different meaning for
> git-am(1) than it does for git-rebase(1). Regardless of the `--empty`
> value passed to git-am(1), a non-empty patch that is already present
> appears to error and stop.
> 
> That is an unfortunate difference. I think that my updated version of
> the git-am(1) docs is still easier to read, and preserves the original
> meaning. So I'm inclined to say that it's still an improvement worth
> making, and perhaps my commit message should just clarify that.
> Thoughts?

Yes I agree the change is worthwhile but I think it would benefit from 
an updated commit message.

>> If you're aiming for consistency then it would be worth listing the
>> possible values in the same order for each command.
> 
> That makes sense. I had initially maintained the existing order in which
> these were documented, keeping the default option first. I think that
> the updated layout makes the order less relevant by making it easier to
> read and identify the default anyway.
> 
> I could see alphabetical being better, though with the changes later in
> this series we'd end up with the deprecated `ask` being first or
> out-or-order at the end. What are your thoughts on the ideal order for
> these?

Alphabetical sounds reasonable, we could sort on the non deprecated 
names with stop and ask grouped together

drop;;
     ...
keep;;
     ...
stop;;
ask;;
     ...
     `ask` is a deprecated synonym of `stop`

>>> +`keep`;;
>>> +     The empty commit will be kept.
>>> +`ask`;;
>>> +     The rebase will halt when the empty commit is applied, allowing you to
>>> +     choose whether to drop it, edit files more, or just commit the empty
>>> +     changes. This option is implied when `--interactive` is specified.
>>>        Other options, like `--exec`, will use the default of drop unless
>>>        `-i`/`--interactive` is explicitly specified.
>>
>> Thanks for adding a bit more detail about the default, however it looks
>> to me like we keep commits that become empty when --exec is specified
>>
>>          if (options.empty == EMPTY_UNSPECIFIED) {
>>                  if (options.flags & REBASE_INTERACTIVE_EXPLICIT)
>>                          options.empty = EMPTY_STOP;
>>                  else if (options.exec.nr > 0)
>>                          options.empty = EMPTY_KEEP;
>>                  else
>>                          options.empty = EMPTY_DROP;
>>          }
>>
>> Off the top of my head I'm not sure why or if that is a good idea.
> 
> The two lines indicating this behavior are actually pre-existing -- I
> did not change them in this patch and thus didn't even think to fact
> check them.
> 
> Upon testing this, I've confirmed that you are correct about the actual
> behavior. I will address this in a separate commit in v2.

Thanks, I'd missed that those were context lines in the diff.

Best Wishes

Phillip

^ permalink raw reply

* Re: [PATCH 3/4] rebase: Update `--empty=ask` to `--empty=drop`
From: Phillip Wood @ 2024-02-01 14:02 UTC (permalink / raw)
  To: Brian Lyles, phillip.wood; +Cc: git, me, newren
In-Reply-To: <CAHPHrSefHb7KddWNS4NS2bAFG9DFfKZ=Ue499+EqDT3myS_tEA@mail.gmail.com>

Hi Brian

On 27/01/2024 21:49, Brian Lyles wrote:
> On Tue, Jan 23, 2024 at 8:24 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>> On 19/01/2024 05:59, brianmlyles@gmail.com wrote:
>>> From: Brian Lyles <brianmlyles@gmail.com>
>>>
>>> When `git-am` got its own `--empty` option in 7c096b8d61 (am: support
>>> --empty=<option> to handle empty patches, 2021-12-09), `stop` was used
>>> instead of `ask`. `stop` is a more accurate term for describing what
>>> really happens,
>>
>> I can see your reasoning but I think of stopping as git's way of asking
>> what to do so I'm not sure if "stop" is better than "ask". I don't know
>> how we ended up with two different terms - the prior art is "ask" so
>> maybe we should change "am --empty" instead. Lets see what others think.
> 
> The suggestion to use 'stop' instead of 'ask' for rebase was initially
> Elijah's[1], which I agreed with. I am certainly open to others'
> opinions here though, and am content with whatever is decided. I am
> mostly aiming for consistency between git-rebase(1), git-am(1), and
> ultimately git-cherry-pick(1).
> 
> [1]: https://lore.kernel.org/git/CABPp-BGJfvBhO_zEX8nLoa8WNsjmwvtZ2qOjmYm9iPoZg4SwPw@mail.gmail.com/

Thanks for the link, that is useful context

>> It would be helpful to mention the tests in the commit message - we end
>> up with a mixture of "--empty=ask" and "--empty=stop" I assume that is
>> by design
> 
> You are correct -- the intent being to ensure that `--ask` continues
> working for as long as it is supported. I'll add this to the message in
> v2.

That makes sense,

Thanks

Phillip

^ permalink raw reply

* Re: [PATCH 1/3] revision: clarify a 'return NULL' in get_reference()
From: Eric Sunshine @ 2024-02-01 14:53 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Patrick Steinhardt, John Cai,
	Christian Couder
In-Reply-To: <20240201115809.1177064-2-christian.couder@gmail.com>

On Thu, Feb 1, 2024 at 6:58 AM Christian Couder
<christian.couder@gmail.com> wrote:
> In general when we know a pointer variable is NULL, it's clearer to
> explicitely return NULL than to return that variable.

s/explicitely/explicitly/

> In get_reference() when 'object' is NULL, we already return NULL
> when 'revs->exclude_promisor_objects && is_promisor_object(oid)' is
> true, but we return 'object' when 'revs->ignore_missing' is true.
>
> Let's make the code clearer and more uniform by also explicitely
> returning NULL when 'revs->ignore_missing' is true.

s/explicitely/explicitly/

> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>

^ permalink raw reply

* Re: [PATCH 1/7] reftable/record: introduce function to compare records by key
From: Eric Sunshine @ 2024-02-01 15:00 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <fadabec696f75ae0fa25bcbf87012fcc4768acaa.1706782841.git.ps@pks.im>

On Thu, Feb 1, 2024 at 8:31 AM Patrick Steinhardt <ps@pks.im> wrote:
> In some places we need to sort reftable records by their keys to
> determine their ordering. This is done by first formatting the keys into
> a `struct strbuf` and then using `strbuf_cmp()` to compare them. This
> logic is needlessly roundabout and can end up costing quite a bit fo CPU
> cycles, both due to the allocation and formatting logic.

s/fo/of/

> Introduce a new `reftable_record_cmp()` function that knows to compare
> two records with each other without requiring allocations.

perhaps: s/knows to compare/knows how to compare/

> Signed-off-by: Patrick Steinhardt <ps@pks.im>

^ permalink raw reply

* Re: [PATCH 1/2] refs: introduce reftable backend
From: Karthik Nayak @ 2024-02-01 15:17 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Han-Wen Nienhuys
In-Reply-To: <5598cd13074beef092a61235a505476d0cbceb90.1706601199.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 24080 bytes --]

Hello,

Patrick Steinhardt <ps@pks.im> writes:
>   - It becomes possible to do truly atomic writes where either all refs
>     are committed to disk or none are. This was not possible with the
>     "files" backend because ref updates were split across multiple loose
>     files.
>
>   - The disk space required to store many refs is reduced, both compared
>     to loose refs and packed-refs. This is enabled both by the reftable
>     format being a binary format, which is more compact, and by prefix
>     compression.
>
>   - We can ignore filesystem-specific behaviour as ref names are not
>     encoded via paths anymore. This means there is no need to handle
>     case sensitivity on Windows systems or Unicode precomposition on
>     macOS.
>
>   - There is no need to rewrite the complete refdb anymore every time a
>     ref is being deleted like it was the case for packed-refs. This
>     means that ref deletions are now constant time instead of scaling
>     linearly with the number of refs.
>
>   - We can ignore file/directory conflicts so that it becomes possible
>     to store both "refs/heads/foo" and "refs/heads/foo/bar".
>
>   - Due to this property we can retain reflogs for deleted refs. We have
>     previously been deleting reflogs together with their refs to avoid
>     file/directory conflicts, which is not necessary anymore.
>

Nit: Maybe also a point about how with the current files backend doesn't
have a good way to differentiate between regular files and refs. While
regular refs are in the 'refs/' folder, pseudorefs have no namespace.
While with the reftable implementation we know that everything within
the reftable is a ref and there are no other refs we need to consider.


> Performance-wise things very much depend on the actual workload. The
> following benchmarks compare the "files" and "reftable" backends in the
> current version:

Pretty nice to have these numbers here.

>
>   - Creating N refs in separate transactions shows that the "files"
>     backend is ~50% faster. This is not surprising given that creating a
>     ref only requires us to create a single loose ref. The "reftable"
>     backend will also perform auto compaction on updates. In real-world
>     workloads we would likely also want to perform pack loose refs,
>     which would likely change the picture.
>
>         Benchmark 1: update-ref: create refs sequentially (refformat = files)
>           Time (mean ± σ):       2.1 ms ±   0.3 ms    [User: 0.6 ms, System: 1.7 ms]
>           Range (min … max):     1.8 ms …   4.3 ms    133 runs
>
>         Benchmark 2: update-ref: create refs sequentially (refformat = reftable)
>           Time (mean ± σ):       2.7 ms ±   0.1 ms    [User: 0.6 ms, System: 2.2 ms]
>           Range (min … max):     2.4 ms …   2.9 ms    132 runs
>
>         Benchmark 3: update-ref: create refs sequentially (refformat = files)
>           Time (mean ± σ):      1.975 s ±  0.006 s    [User: 0.437 s, System: 1.535 s]
>           Range (min … max):    1.969 s …  1.980 s    3 runs
>
>         Benchmark 4: update-ref: create refs sequentially (refformat = reftable)
>           Time (mean ± σ):      2.611 s ±  0.013 s    [User: 0.782 s, System: 1.825 s]
>           Range (min … max):    2.597 s …  2.622 s    3 runs
>
>         Benchmark 5: update-ref: create refs sequentially (refformat = files)
>           Time (mean ± σ):     198.442 s ±  0.241 s    [User: 43.051 s, System: 155.250 s]
>           Range (min … max):   198.189 s … 198.670 s    3 runs
>
>         Benchmark 6: update-ref: create refs sequentially (refformat = reftable)
>           Time (mean ± σ):     294.509 s ±  4.269 s    [User: 104.046 s, System: 190.326 s]
>           Range (min … max):   290.223 s … 298.761 s    3 runs
>

Nit: The refcount is missing in these benchmarks.

> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> new file mode 100644
> index 0000000000..895de0b273
> --- /dev/null
> +++ b/refs/reftable-backend.c
> @@ -0,0 +1,2286 @@
> +#include "../git-compat-util.h"
> +#include "../abspath.h"
> +#include "../chdir-notify.h"
> +#include "../config.h"

I believe this header can be dropped.

Not sure about removing headers which aren't needed. The coding
guidelines mentions that we should include headers that declare the
functions and the types used. I found some headers here which are not
required considering this rule.

> +#include "../environment.h"
> +#include "../gettext.h"
> +#include "../hash.h"
> +#include "../hex.h"
> +#include "../iterator.h"
> +#include "../ident.h"
> +#include "../lockfile.h"
> +#include "../object.h"
> +#include "../path.h"
> +#include "../refs.h"
> +#include "../reftable/reftable-stack.h"
> +#include "../reftable/reftable-record.h"
> +#include "../reftable/reftable-error.h"
> +#include "../reftable/reftable-blocksource.h"

This header can be dropped too.

> +#include "../reftable/reftable-reader.h"

This also.

> +#include "../reftable/reftable-iterator.h"
> +#include "../reftable/reftable-merged.h"
> +#include "../reftable/reftable-generic.h"

This one as well.

> +#include "../setup.h"
> +#include "../strmap.h"
> +#include "../worktree.h"

This one too.

> +struct reftable_ref_store {
> +	struct ref_store base;
> +
> +	struct reftable_stack *main_stack;
> +	struct reftable_stack *worktree_stack;
> +	struct strmap worktree_stacks;
> +	struct reftable_write_options write_options;
> +
> +	unsigned int store_flags;
> +	int err;
> +};

So, I'm assuming that `main_stack` here would be the primary reference
db, even when inside the worktree. I'm wondering why we can't just have

    struct reftable_stack *current_stack;
    struct strmap worktree_stacks;

Reading further on, it becomes necessary to have both, because the user
could
    1. Request main-worktree/{ref}
    2. Request worktrees/{worktree}/{ref}
So it's important that we have access to both the main worktree, the
current worktree and also any other worktree.

Maybe we could just drop the `struct reftable_stack *worktree_stack` and
rely on the map entirely, but I guess `worktree_stack` acts as a caching
layer and avoids using the map unless necessary.

> +static struct ref_store *reftable_be_init(struct repository *repo,
> +					  const char *gitdir,
> +					  unsigned int store_flags)
> +{
> +	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
> +	struct strbuf path = STRBUF_INIT;
> +	int is_worktree;
> +	mode_t mask;
> +
> +	mask = umask(0);
> +	umask(mask);

Took me more than a glance to understand this. For other readers who
didn't know: there is no single API to get the current umask. But
umask(2) sets the umask and returns the prev umask. So we set a random
value (here 0) to get the prev umask and reset it.

> +
> +	base_ref_store_init(&refs->base, repo, gitdir, &refs_be_reftable);
> +	strmap_init(&refs->worktree_stacks);
> +	refs->store_flags = store_flags;
> +	refs->write_options.block_size = 4096;
>

Perhaps use `DEFAULT_BLOCK_SIZE` from "reftable/constants.h" here.

> +	refs->write_options.hash_id = repo->hash_algo->format_id;
> +	refs->write_options.default_permissions = calc_shared_perm(0666 & ~mask);
> +
> +	/*
> +	 * Set up the main reftable stack that is hosted in GIT_COMMON_DIR.
> +	 * This stack contains both the shared and the main worktree refs.
> +	 */
> +	is_worktree = get_common_dir_noenv(&path, gitdir);
> +	if (!is_worktree) {
> +		strbuf_reset(&path);
> +		strbuf_realpath(&path, gitdir, 0);
> +	}

Nit: would be nice to have a comment that for wortrees, `gitdir` would
already be an absolute path.

> +	strbuf_addstr(&path, "/reftable");
> +	refs->err = reftable_new_stack(&refs->main_stack, path.buf,
> +				       refs->write_options);
> +	if (refs->err)
> +		goto done;
> +
> +	/*
> +	 * If we're in a worktree we also need to set up the worktree reftable
> +	 * stack that is contained in the per-worktree GIT_DIR.
> +	 */
> +	if (is_worktree) {
> +		strbuf_reset(&path);
> +		strbuf_addf(&path, "%s/reftable", gitdir);
> +
> +		refs->err = reftable_new_stack(&refs->worktree_stack, path.buf,
> +					       refs->write_options);
> +		if (refs->err)
> +			goto done;
> +	}

Wondering if we should also add this to the `refs->worktree_stacks`.

> +struct reftable_ref_iterator {
> +	struct ref_iterator base;
> +	struct reftable_ref_store *refs;
> +	struct reftable_iterator iter;
> +	struct reftable_ref_record ref;
> +	struct object_id oid;
> +
> +	const char *prefix;
> +	unsigned int flags;
> +	int err;
> +};

So the `flags` in this structure are for the iterator itself but the
`flags` in `base` are flags related to the current ref stored in `base`.

> +static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
> +{
> +	struct reftable_ref_iterator *iter =
> +		(struct reftable_ref_iterator *)ref_iterator;
> +	struct reftable_ref_store *refs = iter->refs;
> +
> +	while (!iter->err) {
> +		int flags = 0;
> +

Nit: I think the syntax in `reftable_reflog_iterator_advance` is similar but
nicer to read, the usage of `while(1)` and returning within the while
loop is better than breaking here and returning outside, either ways, I
think it'd be nicer to make both of them consistent.

> +		iter->err = reftable_iterator_next_ref(&iter->iter, &iter->ref);
> +		if (iter->err)
> +			break;
> +
> +		/*
> +		 * The files backend only lists references contained in
> +		 * "refs/". We emulate the same behaviour here and thus skip
> +		 * all references that don't start with this prefix.
> +		 */
> +		if (!starts_with(iter->ref.refname, "refs/"))
> +			continue;
> +

Since my patch series [1] to print all refs is now merged to next, maybe
you could add this in?

    diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
    index 895de0b273..3f4f905292 100644
    --- a/refs/reftable-backend.c
    +++ b/refs/reftable-backend.c
    @@ -348,11 +348,10 @@ static int
reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
     			break;

     		/*
    -		 * The files backend only lists references contained in
    -		 * "refs/". We emulate the same behaviour here and thus skip
    -		 * all references that don't start with this prefix.
    +		 * Unless the `DO_FOR_EACH_INCLUDE_ALL_REFS` flag is use, we only
    +		 * list references contained in "refs/" to mimic the file-backend.
     		 */
    -		if (!starts_with(iter->ref.refname, "refs/"))
    +		if (!(iter->flags & DO_FOR_EACH_INCLUDE_ALL_REFS) &&
!starts_with(iter->ref.refname, "refs/"))
     			continue;

     		if (iter->prefix &&

> +static enum iterator_selection iterator_select(struct ref_iterator *iter_worktree,
> +					       struct ref_iterator *iter_common,
> +					       void *cb_data UNUSED)
> +{
> +	if (iter_worktree && !iter_common) {
> +		/*
> +		 * Return the worktree ref if there are no more common refs.
> +		 */
> +		return ITER_SELECT_0;
> +	} else if (iter_common) {
> +		/*
> +		 * In case we have pending worktree and common refs we need to
> +		 * yield them based on their lexicographical order. Worktree
> +		 * refs that have the same name as common refs shadow the
> +		 * latter.
> +		 */
> +		if (iter_worktree) {
> +			int cmp = strcmp(iter_worktree->refname,
> +					 iter_common->refname);
> +			if (cmp < 0)
> +				return ITER_SELECT_0;
> +			else if (!cmp)
> +				return ITER_SELECT_0_SKIP_1;
> +		}
> +
> +		 /*
> +		  * Otherwise, if we either have no worktree refs anymore or if
> +		  * the common ref sorts before the next worktree ref, we need
> +		  * to figure out whether the next common ref belongs to the
> +		  * main worktree. In that case, it should be ignored.
> +		  */
> +		if (parse_worktree_ref(iter_common->refname, NULL, NULL,
> +				       NULL) == REF_WORKTREE_SHARED)
> +			return ITER_SELECT_1;
> +

I'm not sure I understand this. When would this situation occur?

> +		return ITER_SKIP_1;
> +	} else {
> +		return ITER_DONE;
> +	}
> +}
> +
> +
> +static int reftable_be_transaction_prepare(struct ref_store *ref_store,
> +					   struct ref_transaction *transaction,
> +					   struct strbuf *err)
> +{
> +	struct reftable_ref_store *refs =
> +		reftable_be_downcast(ref_store, REF_STORE_WRITE|REF_STORE_MAIN, "ref_transaction_prepare");
> +	struct strbuf referent = STRBUF_INIT, head_referent = STRBUF_INIT;
> +	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
> +	struct reftable_transaction_data *tx_data = NULL;
> +	struct object_id head_oid;
> +	unsigned int head_type = 0;
> +	size_t i;
> +	int ret;
> +
> +	ret = refs->err;
> +	if (ret < 0)
> +		goto done;
> +
> +	tx_data = xcalloc(1, sizeof(*tx_data));
> +
> +	/*
> +	 * Preprocess all updates. For one we check that there are no duplicate
> +	 * reference updates in this transaction. Second, we lock all stacks
> +	 * that will be modified during the transaction.
> +	 */
> +	for (i = 0; i < transaction->nr; i++) {
> +		ret = prepare_transaction_update(NULL, refs, tx_data,
> +						 transaction->updates[i], err);
> +		if (ret)
> +			goto done;
> +
> +		string_list_append(&affected_refnames,
> +				   transaction->updates[i]->refname);
> +	}
> +
> +	/*
> +	 * Now that we have counted updates per stack we can preallocate their
> +	 * arrays. This avoids having to reallocate many times.
> +	 */
> +	for (i = 0; i < tx_data->args_nr; i++) {
> +		CALLOC_ARRAY(tx_data->args[i].updates, tx_data->args[i].updates_expected);
> +		tx_data->args[i].updates_alloc = tx_data->args[i].updates_expected;
> +	}
> +
> +	/*
> +	 * Fail if a refname appears more than once in the transaction.
> +	 * This code is taken from the files backend and is a good candidate to
> +	 * be moved into the generic layer.
> +	 */
> +	string_list_sort(&affected_refnames);
> +	if (ref_update_reject_duplicates(&affected_refnames, err)) {
> +		ret = TRANSACTION_GENERIC_ERROR;
> +		goto done;
> +	}
> +
> +	ret = read_ref_without_reload(stack_for(refs, "HEAD", NULL), "HEAD", &head_oid,
> +				      &head_referent, &head_type);
> +	if (ret < 0)
> +		goto done;
> +
> +	for (i = 0; i < transaction->nr; i++) {
> +		struct ref_update *u = transaction->updates[i];
> +		struct object_id current_oid = {0};
> +		struct reftable_stack *stack;
> +		const char *rewritten_ref;
> +
> +		stack = stack_for(refs, u->refname, &rewritten_ref);
>

Wondering why we didn't just iterate over `tx_data.args` and
`args[i].updates` since there we already have the required stack.

Update: we don't save rewritten_ref in tx_data, so that's one blocker
for this.

> +
> +		if (u->type & REF_ISSYMREF) {
> +			const char *resolved = refs_resolve_ref_unsafe(&refs->base, u->refname, 0,
> +								       &current_oid, NULL);
> +
> +			if (u->flags & REF_NO_DEREF) {
> +				/*
> +				 * The reftable stack is locked at this point
> +				 * already, so it should be safe to call
> +				 * `refs_resolve_ref_unsafe()` here.
> +				 */

Shouldn't this comment be a few lines before?

> +
> +static int write_transaction_table(struct reftable_writer *writer, void *cb_data)
> +{
> +	struct write_transaction_table_arg *arg = cb_data;
> +	struct reftable_merged_table *mt =
> +		reftable_stack_merged_table(arg->stack);
> +	uint64_t ts = reftable_stack_next_update_index(arg->stack);
> +	struct reftable_log_record *logs = NULL;
> +	size_t logs_nr = 0, logs_alloc = 0, i;
> +	int ret = 0;
> +
> +	QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);
> +
> +	reftable_writer_set_limits(writer, ts, ts);
> +
> +	for (i = 0; i < arg->updates_nr; i++) {
> +		struct reftable_transaction_update *tx_update = &arg->updates[i];
> +		struct ref_update *u = tx_update->update;
> +
> +		/*
> +		 * Write a reflog entry when updating a ref to point to
> +		 * something new in either of the following cases:
> +		 *
> +		 * - The reference is about to be deleted. We always want to
> +		 *   delete the reflog in that case.
> +		 * - REF_FORCE_CREATE_REFLOG is set, asking us to always create
> +		 *   the reflog entry.
> +		 * - `core.logAllRefUpdates` tells us to create the reflog for
> +		 *   the given ref.
> +		 */
> +		if (u->flags & REF_HAVE_NEW && !(u->type & REF_ISSYMREF) && is_null_oid(&u->new_oid)) {
> +			struct reftable_log_record log = {0};
> +			struct reftable_iterator it = {0};
> +
> +			/*
> +			 * When deleting refs we also delete all reflog entries
> +			 * with them. While it is not strictly required to
> +			 * delete reflogs together with their refs, this
> +			 * matches the behaviour of the files backend.
> +			 *
> +			 * Unfortunately, we have no better way than to delete
> +			 * all reflog entries one by one.
> +			 */
> +			ret = reftable_merged_table_seek_log(mt, &it, u->refname);
> +			while (ret == 0) {
> +				struct reftable_log_record *tombstone;
> +
> +				ret = reftable_iterator_next_log(&it, &log);
> +				if (ret < 0)
> +					break;
> +				if (ret > 0 || strcmp(log.refname, u->refname)) {
> +					ret = 0;
> +					break;
> +				}
> +
> +				ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
> +				tombstone = &logs[logs_nr++];
> +				tombstone->refname = xstrdup(u->refname);
> +				tombstone->value_type = REFTABLE_LOG_DELETION,

Why is this a comma? one other such instance in this file.

> +static int write_copy_table(struct reftable_writer *writer, void *cb_data)
> +{
> +	struct write_copy_arg *arg = cb_data;
> +	uint64_t deletion_ts, creation_ts;
> +	struct reftable_merged_table *mt = reftable_stack_merged_table(arg->stack);
> +	struct reftable_ref_record old_ref = {0}, refs[2] = {0};
> +	struct reftable_log_record old_log = {0}, *logs = NULL;
> +	struct reftable_iterator it = {0};
> +	struct string_list skip = STRING_LIST_INIT_NODUP;
> +	struct strbuf errbuf = STRBUF_INIT;
> +	size_t logs_nr = 0, logs_alloc = 0, i;
> +	int ret;
> +
> +	if (reftable_stack_read_ref(arg->stack, arg->oldname, &old_ref)) {
> +		ret = error(_("refname %s not found"), arg->oldname);
> +		goto done;
> +	}
> +	if (old_ref.value_type == REFTABLE_REF_SYMREF) {
> +		ret = error(_("refname %s is a symbolic ref, copying it is not supported"),
> +			    arg->oldname);
> +		goto done;
> +	}
> +
> +	/*
> +	 * There's nothing to do in case the old and new name are the same, so
> +	 * we exit early in that case.
> +	 */
> +	if (!strcmp(arg->oldname, arg->newname)) {
> +		ret = 0;
> +		goto done;
> +	}
> +
> +	/*
> +	 * Verify that the new refname is available.
> +	 */
> +	string_list_insert(&skip, arg->oldname);
> +	ret = refs_verify_refname_available(&arg->refs->base, arg->newname,
> +					    NULL, &skip, &errbuf);
> +	if (ret < 0) {
> +		error("%s", errbuf.buf);
> +		goto done;
> +	}
> +
> +	/*
> +	 * When deleting the old reference we have to use two update indices:
> +	 * one to delete the old ref and its reflog, and once to create the new
>

s/once/one

> +	 * ref and its reflog. They need to be staged with two separate indices
> +	 * because the new reflog needs to encode both the deletion of the old
> +	 * branch and the creation of the new branch, and we cannot do two
> +	 * changes to a reflog in a single update.
> +	 */
> +	deletion_ts = creation_ts = reftable_stack_next_update_index(arg->stack);
> +	if (arg->delete_old)
> +		creation_ts++;
> +	reftable_writer_set_limits(writer, deletion_ts, creation_ts);
> +
> +	/*
> +	 * Add the new reference. If this is a rename then we also delete the
> +	 * old reference.
> +	 */
> +	refs[0] = old_ref;
> +	refs[0].refname = (char *)arg->newname;
> +	refs[0].update_index = creation_ts;
> +	if (arg->delete_old) {
> +		refs[1].refname = (char *)arg->oldname;
> +		refs[1].value_type = REFTABLE_REF_DELETION;
> +		refs[1].update_index = deletion_ts;
> +	}
> +	ret = reftable_writer_add_refs(writer, refs, arg->delete_old ? 2 : 1);
> +	if (ret < 0)
> +		goto done;
> +
> +	/*
> +	 * When deleting the old branch we need to create a reflog entry on the
> +	 * new branch name that indicates that the old branch has been deleted
> +	 * and then recreated. This is a tad weird, but matches what the files
> +	 * backend does.
> +	 */
> +	if (arg->delete_old) {
> +		struct strbuf head_referent = STRBUF_INIT;
> +		struct object_id head_oid;
> +		int append_head_reflog;
> +		unsigned head_type = 0;
> +
> +		ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
> +		memset(&logs[logs_nr], 0, sizeof(logs[logs_nr]));
> +		fill_reftable_log_record(&logs[logs_nr]);
> +		logs[logs_nr].refname = (char *)arg->newname;
> +		logs[logs_nr].update_index = deletion_ts;
> +		logs[logs_nr].value.update.message =
> +			xstrndup(arg->logmsg, arg->refs->write_options.block_size / 2);

Question: here and other places in this file, shouldn't we free memory
allocated by `xstrndup`?

Also, why is it `block_size / 2` ?

> +static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
> +								  struct reftable_stack *stack)
> +{
> +	struct reftable_reflog_iterator *iter;
> +	struct reftable_merged_table *mt;
> +	int ret;
> +
> +	iter = xcalloc(1, sizeof(*iter));
> +	base_ref_iterator_init(&iter->base, &reftable_reflog_iterator_vtable, 1);
> +	iter->refs = refs;
> +	iter->base.oid = &iter->oid;
> +
> +	ret = reftable_stack_reload(refs->main_stack);
> +	if (ret < 0)
> +		goto done;
> +
> +	mt = reftable_stack_merged_table(stack);
> +	ret = reftable_merged_table_seek_log(mt, &iter->iter, "");
> +	if (ret < 0)
> +		goto done;
> +

Keeping it similar to `ref_iterator_for_stack`, perhaps:

    ret = refs->err;
	if (ret)
		goto done;

	ret = reftable_stack_reload(stack);
	if (ret)
		goto done;

	merged_table = reftable_stack_merged_table(stack);

	ret = reftable_merged_table_seek_ref(merged_table, &iter->iter, prefix);
	if (ret)
		goto done;

> +static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
> +						   const char *refname,
> +						   each_reflog_ent_fn fn,
> +						   void *cb_data)
> +{
> +	struct reftable_ref_store *refs =
> +		reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent_reverse");
> +	struct reftable_stack *stack = stack_for(refs, refname, &refname);
> +	struct reftable_merged_table *mt = NULL;
> +	struct reftable_log_record log = {0};
> +	struct reftable_iterator it = {0};
> +	int ret;
> +
> +	if (refs->err < 0)
> +		return refs->err;

Nit: seems like this function and the one below
`reftable_be_for_each_reflog_ent`, share the same code apart from
iteration direction. Wonder if it'd be nicer to extract out the common
code from them.

> +static int reftable_be_reflog_exists(struct ref_store *ref_store,
> +				     const char *refname)
> +{
> +	struct reftable_ref_store *refs =
> +		reftable_be_downcast(ref_store, REF_STORE_READ, "reflog_exists");
> +	struct reftable_stack *stack = stack_for(refs, refname, &refname);
> +	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
> +	struct reftable_log_record log = {0};
> +	struct reftable_iterator it = {0};
> +	int ret;
> +
> +	ret = refs->err;
> +	if (ret < 0)
> +		goto done;
> +
> +	ret = reftable_stack_reload(stack);
> +	if (ret)
> +		goto done;
> +
> +	ret = reftable_merged_table_seek_log(mt, &it, refname);
> +	if (ret)
> +		goto done;
> +
> +	/*
> +	 * Seek the reflog to see whether it contains any reflog entries which
> +	 * aren't marked for deletion.
> +	 */

Shouldn't we be checking `log.value_type` in that case?

This email only reviews this file, I'll review the others in follow up
email[s].

[1]: https://git.kernel.org/pub/scm/git/git.git/commit/?h=next&id=e7a9234a8b39bd69de12b105731413ab8e62942e

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply

* Re: Migrate away from vger to GitHub or (on-premise) GitLab?
From: Konstantin Ryabitsev @ 2024-02-01 15:39 UTC (permalink / raw)
  To: Hans Meiser; +Cc: git@vger.kernel.org
In-Reply-To: <AS2P195MB2135D91EE464FF30EE84E77EE2432@AS2P195MB2135.EURP195.PROD.OUTLOOK.COM>

On Thu, Feb 01, 2024 at 12:10:11PM +0000, Hans Meiser wrote:
> is there any current discussion about moving Git development away from using
> a mailing list to some modern form of collaboration?
> 
> I'd like to be able to follow a structured discussion in issues and to
> contribute to the Git documentation, but the mailing list currently just
> bloats my personal inbox with loads of uninteresting e-mails in an
> unstructured waterfall of messy discussion that I am not able to follow
> professionally.

Here's a perspective from the world of Linux kernel, where this discussion is
continuously raging. Funny enough, the main objection a lot of kernel
maintainers have to forges is that it makes it really hard to find relevant
discussions once the volume goes above a certain threshold. These folks have
become *extremely* efficient at querying and filtering the mailing list
traffic, to the point where all they ever see are just those discussions
relevant to their work. They love the fact that it all arrives into the same
place (their inbox) without having to go and click on various websites, each
with their own login information, UI, and preferred workflow.

The kernel maintainers are able to review tens of thousands of patches monthly
with only about a hundred or so top maintainers. To them, this system is
working great, especially now that some tools allow easy ways to query,
retrieve, verify, and apply patches (shameless plug for lore, lei, and b4
here).

The obvious problem, of course, is that these folks are FOSS's "marathon
runners" who got really good at their workflow, but the situation is different
for anyone else who is just starting out. Any new kernel maintainer stepping
up obviously finds this overwhelming, because they aren't yet so good at
filtering the huge volume of the mailing list traffic and to them it's just a
torrent of mostly irrelevant patches.

> Are you consideration for migrating?

Yes, of course, this is constantly under consideration. There isn't some sort
of anti-forge cabal that is preventing things from going forward, but there
are some serious hurdles and considerations to consider:

- How to avoid a vendor lock-in? Those of us who have been around for a while
  have seen forges bloom, and then shrink into irrelevance (e.g. bitkeeper)
  or slowly ensh*ttify to the point of unusability (sourceforge). GitHub is a
  proprietary service owned by a single company who are currently
  FOSS-friendly, but have certainly been extremely FOSS-hostile in the past.
  GitLab is open-core, and the current record for open-core projects isn't
  very encouraging (Puppet open-cored themselves into irrelevance, Terraform
  has gone full-proprietary, among most recent examples). Full-FOSS
  alternatives exist, but people aren't really that enthused about using
  less-popular solutions like Forgejo, because they hate unfamiliar UIs almost
  as much, or even more than they hate unfiltered mailing lists.

- How to avoid centralization and single points of failure? If Linux or Git
  move to a self-hosted forge, how do we ensure that an adversary can't stop
  all development on a project by knocking it offline for weeks? This has
  literally just happened to Sourcehut and Codeberg -- and as far as anyone
  can tell, the attacker was just bored and knocked them out just because they
  could. Yes, you can knock out vger, but this will only impact the mailing
  list -- people can still send around patches and hold discussions by
  temporarily moving to alternative hosts. With the distributed nature of the
  mailing list archives, this can even be largely transparent to anyone using
  lei-queries.

- How to avoid alienating these hundreds of key maintainers who are now
  extremely proficient at their query-based workflows? We're talking about an
  extremely finely-tuned engine that is performing remarkably well -- we don't
  want to disrupt development for months just to try things out with a forge
  and find that it isn't working out.

Finally, there's also the consideration of current trends. One upside of "AI"
(LLM, really) technologies is that they are extremely good at taking in a huge
source of data and finding relevant information based on natural language
queries. I can very easily see a mechanism spring up in the next year or less
where you can issue a query like "send me any threads about reftables or
promissory remotes if they contain follow-ups from Junio" and reasonaly expect
this to work and work great -- all while keeping things decentralized in
addition to distributed.

Above all, this isn't a "forges are terrible and shouldn't be used" response
-- they are clearly useful, especially when it comes to CI integrations. A
large part of my work is bridging forges with mailing lists and vice-versa,
which I hope I'll be able to do in the near future (GitGitGadget already does
it with GitHub, but my goal is to have a pluggable multi-forge solution). I
just wanted to highlight the aspects that aren't necessarily obvious or
visible from the outside.

Best regards,
-K

^ permalink raw reply

* Re: [PATCH v2 3/9] reftable/stack: fix parameter validation when compacting range
From: Toon Claes @ 2024-02-01 16:15 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Eric Sunshine, Junio C Hamano
In-Reply-To: <f134702dc5f656942baafbd80af46ad928ee1449.1706772591.git.ps@pks.im>


Patrick Steinhardt <ps@pks.im> writes:

> diff --git a/reftable/stack.c b/reftable/stack.c
> index d084823a92..b6b24c90bf 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -1146,12 +1146,14 @@ static int stack_compact_range(struct reftable_stack *st, int first, int last,
>  done:
>  	free_names(delete_on_success);
>
> -	listp = subtable_locks;
> -	while (*listp) {
> -		unlink(*listp);
> -		listp++;
> +	if (subtable_locks) {
> +		listp = subtable_locks;
> +		while (*listp) {
> +			unlink(*listp);
> +			listp++;
> +		}
> +		free_names(subtable_locks);
>  	}
> -	free_names(subtable_locks);
>  	if (lock_file_fd >= 0) {
>  		close(lock_file_fd);
>  		lock_file_fd = -1;

Technically, this change is not needed, because `free_names()` deals
with NULL pointers already.

--
Toon

^ permalink raw reply

* Re: [PATCH 1/9] reftable: introduce macros to grow arrays
From: Junio C Hamano @ 2024-02-01 16:30 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <ZbtIQ3Hk6Mgvkv4j@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

> Very good point indeed. I don't think peak memory usage is really all
> that helpful either because the problem is not that we are allocating
> arrays that we keep around all the time, but many small arrays which are
> short lived. So what is telling is the total number of bytes we end up
> allocating:

True.  sum of requested sizes to all the calls to malloc() and
friends to allocate, ignoring what gets free()d in between, is what
would show how much we try to consume.

> Allocating 21 times as many bytes with our default growth factor should
> be a much more compelling argument why we don't actually want to use it
> compared to the 2% speedup.

;-).


^ permalink raw reply

* Re: [PATCH v3 03/10] trailer: unify trailer formatting machinery
From: Junio C Hamano @ 2024-02-01 16:41 UTC (permalink / raw)
  To: Linus Arver, Christian Couder
  Cc: Linus Arver via GitGitGadget, git, Emily Shaffer, Josh Steadmon,
	Randall S. Becker
In-Reply-To: <xmqqfryd2drm.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> ..., if that gives us a readable set of bite-sized changes that
> prepare a solid foundation to rebuild things on top.  I am having a
> feeling that not even a single person has reviewed them on list even
> though we are already at the third iteration, which is quite
> frustrating (and I would imagine that it would be frustrating for
> you, too), ...

I guess "not even a single person" was a bit unfair, given that we
did see some minor comments from me and also Josh during the first
round.

But neither of two rounds saw any in-depth reviews that question "is
the code doing the right thing?", which would take a real reading of
and a comparison between the code before and after the patches, with
some understanding of how things have been working, how things were
envisioned to evolve, and how the patch author would want to change
the course of the evolution of the code in the longer term.  

Christian, you've been touching the code in this area the longuest.
Can we have some of your time reviewing these?

Thanks.

^ permalink raw reply

* Re: [PATCH v3 0/2] index-pack: fsck honor checks
From: Junio C Hamano @ 2024-02-01 16:44 UTC (permalink / raw)
  To: John Cai; +Cc: Jonathan Tan, John Cai via GitGitGadget, git
In-Reply-To: <222CEC85-73B0-49CC-BB81-D6E6F36018B3@gmail.com>

John Cai <johncai86@gmail.com> writes:

>>> Thanks for clarifying! Would you mind providing a patch to revise the wording
>>> here to make it clearer? I would try but I feel like I might get the wording
>>> wrong.
>>
>> I think the wording there is already mostly correct, except maybe make
>> everything plural (a tree -> trees, a .gitmodules blob -> .gitmodules
>> blobs, hash of that blob -> hashes of those blobs). We might also need
>> to modify a test to show that the current code indeed handles the plural
>> situation correctly. I don't have time right now to get to this, so
>> hopefully someone could pick this up.
>
> Thanks! It sounds like we may want to tackle this as part of another patch.

Yeah, the existing documentation has been with our users for some
time, and it is not ultra urgent to fix it in that sense.  I'd say
that it can even wait until JTan gets bored with what he's doing and
needs some distraction himself ;-) 

As long as our collective mind remembers it as #leftoverbits it
would be sufficient.

Thanks, both.

^ permalink raw reply

* Re: [PATCH 1/3] revision: clarify a 'return NULL' in get_reference()
From: Christian Couder @ 2024-02-01 16:49 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: git, Junio C Hamano, Patrick Steinhardt, John Cai,
	Christian Couder
In-Reply-To: <CAPig+cSNM0VJZ5SpvazY5T6rFvXuoTdfgD5J5f36PW4iW7xLVA@mail.gmail.com>

On Thu, Feb 1, 2024 at 3:53 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> On Thu, Feb 1, 2024 at 6:58 AM Christian Couder
> <christian.couder@gmail.com> wrote:
> > In general when we know a pointer variable is NULL, it's clearer to
> > explicitely return NULL than to return that variable.
>
> s/explicitely/explicitly/

[...]

> > Let's make the code clearer and more uniform by also explicitely
> > returning NULL when 'revs->ignore_missing' is true.
>
> s/explicitely/explicitly/

Thanks, it's fixed in my current version. Not sure I have to resend
just to fix this though.

^ permalink raw reply

* Re: Migrate away from vger to GitHub or (on-premise) GitLab?
From: Dragan Simic @ 2024-02-01 16:54 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Hans Meiser, git
In-Reply-To: <20240201-primitive-aardwark-of-contentment-aaabb9@lemur>

Hello Konstantin,

On 2024-02-01 16:39, Konstantin Ryabitsev wrote:
> On Thu, Feb 01, 2024 at 12:10:11PM +0000, Hans Meiser wrote:
>> is there any current discussion about moving Git development away from 
>> using
>> a mailing list to some modern form of collaboration?
>> 
>> I'd like to be able to follow a structured discussion in issues and to
>> contribute to the Git documentation, but the mailing list currently 
>> just
>> bloats my personal inbox with loads of uninteresting e-mails in an
>> unstructured waterfall of messy discussion that I am not able to 
>> follow
>> professionally.
> 
> Here's a perspective from the world of Linux kernel, where this 
> discussion is
> continuously raging. Funny enough, the main objection a lot of kernel
> maintainers have to forges is that it makes it really hard to find 
> relevant
> discussions once the volume goes above a certain threshold. These folks 
> have
> become *extremely* efficient at querying and filtering the mailing list
> traffic, to the point where all they ever see are just those 
> discussions
> relevant to their work. They love the fact that it all arrives into the 
> same
> place (their inbox) without having to go and click on various websites, 
> each
> with their own login information, UI, and preferred workflow.
> 
> The kernel maintainers are able to review tens of thousands of patches 
> monthly
> with only about a hundred or so top maintainers. To them, this system 
> is
> working great, especially now that some tools allow easy ways to query,
> retrieve, verify, and apply patches (shameless plug for lore, lei, and 
> b4
> here).
> 
> The obvious problem, of course, is that these folks are FOSS's 
> "marathon
> runners" who got really good at their workflow, but the situation is 
> different
> for anyone else who is just starting out. Any new kernel maintainer 
> stepping
> up obviously finds this overwhelming, because they aren't yet so good 
> at
> filtering the huge volume of the mailing list traffic and to them it's 
> just a
> torrent of mostly irrelevant patches.
> 
>> Are you consideration for migrating?
> 
> Yes, of course, this is constantly under consideration. There isn't 
> some sort
> of anti-forge cabal that is preventing things from going forward, but 
> there
> are some serious hurdles and considerations to consider:
> 
> - How to avoid a vendor lock-in? Those of us who have been around for a 
> while
>   have seen forges bloom, and then shrink into irrelevance (e.g. 
> bitkeeper)
>   or slowly ensh*ttify to the point of unusability (sourceforge). 
> GitHub is a
>   proprietary service owned by a single company who are currently
>   FOSS-friendly, but have certainly been extremely FOSS-hostile in the 
> past.
>   GitLab is open-core, and the current record for open-core projects 
> isn't
>   very encouraging (Puppet open-cored themselves into irrelevance, 
> Terraform
>   has gone full-proprietary, among most recent examples). Full-FOSS
>   alternatives exist, but people aren't really that enthused about 
> using
>   less-popular solutions like Forgejo, because they hate unfamiliar UIs 
> almost
>   as much, or even more than they hate unfiltered mailing lists.
> 
> - How to avoid centralization and single points of failure? If Linux or 
> Git
>   move to a self-hosted forge, how do we ensure that an adversary can't 
> stop
>   all development on a project by knocking it offline for weeks? This 
> has
>   literally just happened to Sourcehut and Codeberg -- and as far as 
> anyone
>   can tell, the attacker was just bored and knocked them out just 
> because they
>   could. Yes, you can knock out vger, but this will only impact the 
> mailing
>   list -- people can still send around patches and hold discussions by
>   temporarily moving to alternative hosts. With the distributed nature 
> of the
>   mailing list archives, this can even be largely transparent to anyone 
> using
>   lei-queries.
> 
> - How to avoid alienating these hundreds of key maintainers who are now
>   extremely proficient at their query-based workflows? We're talking 
> about an
>   extremely finely-tuned engine that is performing remarkably well -- 
> we don't
>   want to disrupt development for months just to try things out with a 
> forge
>   and find that it isn't working out.
> 
> Finally, there's also the consideration of current trends. One upside 
> of "AI"
> (LLM, really) technologies is that they are extremely good at taking in 
> a huge
> source of data and finding relevant information based on natural 
> language
> queries. I can very easily see a mechanism spring up in the next year 
> or less
> where you can issue a query like "send me any threads about reftables 
> or
> promissory remotes if they contain follow-ups from Junio" and reasonaly 
> expect
> this to work and work great -- all while keeping things decentralized 
> in
> addition to distributed.
> 
> Above all, this isn't a "forges are terrible and shouldn't be used" 
> response
> -- they are clearly useful, especially when it comes to CI 
> integrations. A
> large part of my work is bridging forges with mailing lists and 
> vice-versa,
> which I hope I'll be able to do in the near future (GitGitGadget 
> already does
> it with GitHub, but my goal is to have a pluggable multi-forge 
> solution). I
> just wanted to highlight the aspects that aren't necessarily obvious or
> visible from the outside.

Thank you very much for taking your time to write this down!
Much appreciated.

^ permalink raw reply

* Re: Migrate away from vger to GitHub or (on-premise) GitLab?
From: Dragan Simic @ 2024-02-01 17:00 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Hans Meiser, git
In-Reply-To: <7e395301c5ff46a69d8aca71eb0bb766@manjaro.org>

On 2024-02-01 17:54, Dragan Simic wrote:
> Hello Konstantin,
> 
> On 2024-02-01 16:39, Konstantin Ryabitsev wrote:
>> On Thu, Feb 01, 2024 at 12:10:11PM +0000, Hans Meiser wrote:
>>> is there any current discussion about moving Git development away 
>>> from using
>>> a mailing list to some modern form of collaboration?
>>> 
>>> I'd like to be able to follow a structured discussion in issues and 
>>> to
>>> contribute to the Git documentation, but the mailing list currently 
>>> just
>>> bloats my personal inbox with loads of uninteresting e-mails in an
>>> unstructured waterfall of messy discussion that I am not able to 
>>> follow
>>> professionally.
>> 
>> Here's a perspective from the world of Linux kernel, where this 
>> discussion is
>> continuously raging. Funny enough, the main objection a lot of kernel
>> maintainers have to forges is that it makes it really hard to find 
>> relevant
>> discussions once the volume goes above a certain threshold. These 
>> folks have
>> become *extremely* efficient at querying and filtering the mailing 
>> list
>> traffic, to the point where all they ever see are just those 
>> discussions
>> relevant to their work. They love the fact that it all arrives into 
>> the same
>> place (their inbox) without having to go and click on various 
>> websites, each
>> with their own login information, UI, and preferred workflow.
>> 
>> The kernel maintainers are able to review tens of thousands of patches 
>> monthly
>> with only about a hundred or so top maintainers. To them, this system 
>> is
>> working great, especially now that some tools allow easy ways to 
>> query,
>> retrieve, verify, and apply patches (shameless plug for lore, lei, and 
>> b4
>> here).
>> 
>> The obvious problem, of course, is that these folks are FOSS's 
>> "marathon
>> runners" who got really good at their workflow, but the situation is 
>> different
>> for anyone else who is just starting out. Any new kernel maintainer 
>> stepping
>> up obviously finds this overwhelming, because they aren't yet so good 
>> at
>> filtering the huge volume of the mailing list traffic and to them it's 
>> just a
>> torrent of mostly irrelevant patches.
>> 
>>> Are you consideration for migrating?
>> 
>> Yes, of course, this is constantly under consideration. There isn't 
>> some sort
>> of anti-forge cabal that is preventing things from going forward, but 
>> there
>> are some serious hurdles and considerations to consider:
>> 
>> - How to avoid a vendor lock-in? Those of us who have been around for 
>> a while
>>   have seen forges bloom, and then shrink into irrelevance (e.g. 
>> bitkeeper)
>>   or slowly ensh*ttify to the point of unusability (sourceforge). 
>> GitHub is a
>>   proprietary service owned by a single company who are currently
>>   FOSS-friendly, but have certainly been extremely FOSS-hostile in the 
>> past.
>>   GitLab is open-core, and the current record for open-core projects 
>> isn't
>>   very encouraging (Puppet open-cored themselves into irrelevance, 
>> Terraform
>>   has gone full-proprietary, among most recent examples). Full-FOSS
>>   alternatives exist, but people aren't really that enthused about 
>> using
>>   less-popular solutions like Forgejo, because they hate unfamiliar 
>> UIs almost
>>   as much, or even more than they hate unfiltered mailing lists.
>> 
>> - How to avoid centralization and single points of failure? If Linux 
>> or Git
>>   move to a self-hosted forge, how do we ensure that an adversary 
>> can't stop
>>   all development on a project by knocking it offline for weeks? This 
>> has
>>   literally just happened to Sourcehut and Codeberg -- and as far as 
>> anyone
>>   can tell, the attacker was just bored and knocked them out just 
>> because they
>>   could. Yes, you can knock out vger, but this will only impact the 
>> mailing
>>   list -- people can still send around patches and hold discussions by
>>   temporarily moving to alternative hosts. With the distributed nature 
>> of the
>>   mailing list archives, this can even be largely transparent to 
>> anyone using
>>   lei-queries.
>> 
>> - How to avoid alienating these hundreds of key maintainers who are 
>> now
>>   extremely proficient at their query-based workflows? We're talking 
>> about an
>>   extremely finely-tuned engine that is performing remarkably well -- 
>> we don't
>>   want to disrupt development for months just to try things out with a 
>> forge
>>   and find that it isn't working out.
>> 
>> Finally, there's also the consideration of current trends. One upside 
>> of "AI"
>> (LLM, really) technologies is that they are extremely good at taking 
>> in a huge
>> source of data and finding relevant information based on natural 
>> language
>> queries. I can very easily see a mechanism spring up in the next year 
>> or less
>> where you can issue a query like "send me any threads about reftables 
>> or
>> promissory remotes if they contain follow-ups from Junio" and 
>> reasonaly expect
>> this to work and work great -- all while keeping things decentralized 
>> in
>> addition to distributed.
>> 
>> Above all, this isn't a "forges are terrible and shouldn't be used" 
>> response
>> -- they are clearly useful, especially when it comes to CI 
>> integrations. A
>> large part of my work is bridging forges with mailing lists and 
>> vice-versa,
>> which I hope I'll be able to do in the near future (GitGitGadget 
>> already does
>> it with GitHub, but my goal is to have a pluggable multi-forge 
>> solution). I
>> just wanted to highlight the aspects that aren't necessarily obvious 
>> or
>> visible from the outside.
> 
> Thank you very much for taking your time to write this down!
> Much appreciated.

s/your time/the time/

Sorry for the noise.

^ permalink raw reply

* Re: Migrate away from vger to GitHub or (on-premise) GitLab?
From: Hans Meiser @ 2024-02-01 17:28 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: git@vger.kernel.org
In-Reply-To: <20240201-primitive-aardwark-of-contentment-aaabb9@lemur>

Thank you for enlightening me and elaborating on all of these very important facts!

Just to make sure: So "git" is considered part of the kernel? And the "git documentation" is considered part of the kernel, too?

Shouldn't these topics be separated then into separate repositories, particularly the git documentation?

For people like me, who are contributing to dozens of documentations on GitHub (and GitLab) … We don't focus on the kernel alone. We receive dozens of important technical, business and financially important e-mails from different sources day by day. So, people like me need some modern, common channels/tools for contributing. (If contribution is considered helpful and valuable by the kernel team at all.)

With todays platforms, issues can be created by e-mail and e-mails will be received with each issue update. It's even possible to upload patches via REST services. No web browser required. So this would keep mailing list users acquainted to their habit.

Setting up a local (on-premise) GitLab or Azure DevOps server for long-term use should not be impossible. I'm running each of these myself. Once installed on-premise, the installation wouldn't be bound to any continuous support. All it needs is a provider for keeping the server machine running.

Cheers,
AxelD

^ permalink raw reply

* Re: [PATCH 3/7] reftable/merged: skip comparison for records of the same subiter
From: Eric Sunshine @ 2024-02-01 17:29 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <0ca86eba710895f0e22fc15fe5221f5487031f64.1706782841.git.ps@pks.im>

On Thu, Feb 1, 2024 at 8:17 AM Patrick Steinhardt <ps@pks.im> wrote:
> When retrieving the next entry of a merged iterator we need to drop all
> records of other sub-iterators that would be shadowed by the record that
> we are about to return. We do this by comparing record keys, dropping
> all keys that are smaller or equal to the key of the record we are about
> to return.
>
> There is an edge case here where we can skip that comparison: when the
> record in the priority queue comes from the same subiterator than the

s/than/as/

> record we are about to return then we know that its key must be larger
> than the key of the record we are about to return. This property is
> guaranteed by the sub-iterators, and if it didn't hold then the whole
> merged iterator would return records in the wrong order, too.
>
> While this may seem like a very specific edge case it's in fact quite
> likely to happen. For most repositories out there you can assume that we
> will end up with one large table and several smaller ones on top of it.
> Thus, it is very likely that the next entry will sort towards the top of
> the priority queue.
>
> Special case this and break out of the loop in that case. The following
> benchmark uses git-show-ref(1) to print a single ref matching a pattern
> out of 1 million refs:
>
>   Benchmark 1: show-ref: single matching ref (revision = HEAD~)
>     Time (mean ± σ):     162.6 ms ±   4.5 ms    [User: 159.0 ms, System: 3.5 ms]
>     Range (min … max):   156.6 ms … 188.5 ms    1000 runs
>
>   Benchmark 2: show-ref: single matching ref (revision = HEAD)
>     Time (mean ± σ):     156.8 ms ±   4.7 ms    [User: 153.0 ms, System: 3.6 ms]
>     Range (min … max):   151.4 ms … 188.4 ms    1000 runs
>
>   Summary
>     show-ref: single matching ref (revision = HEAD) ran
>       1.04 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>

^ permalink raw reply

* Re: Migrate away from vger to GitHub or (on-premise) GitLab?
From: Hans Meiser @ 2024-02-01 17:39 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: git@vger.kernel.org
In-Reply-To: <ada5564d-d810-4707-83b8-c00a7b5aa79f@app.fastmail.com>

Hi Kristoffer,

thanks for sharing these very helpful links to GitGitGadget! Love it!

Best regards,
AxelD

--
From: Kristoffer Haugsbakk <code@khaugsbakk.name>
Sent: Thursday, February 1, 2024 13:21
To: Hans Meiser <brille1@hotmail.com>
Cc: git@vger.kernel.org <git@vger.kernel.org>
Subject: Re: Migrate away from vger to GitHub or (on-premise) GitLab?

Hi

On Thu, Feb 1, 2024, at 13:10, Hans Meiser wrote:
> Hi,
>
> Regards,
> Axel Dahmen

A relevant discussion seems to be “Improving new contrib onboarding”[1]

There’s GitGitGadget for people who want to use GitHub as a bridge[2]

There’s an unofficial issue tracker for project ideas (not for bugs)[3]

That’s what I know.

🔗 1: https://lore.kernel.org/git/ZRrgMDacYpj41DcO@nand.local/
🔗 2: https://gitgitgadget.github.io/
🔗 3: https://github.com/gitgitgadget/git/issues

--
Kristoffer Haugsbakk

^ permalink raw reply

* Re: Migrate away from vger to GitHub or (on-premise) GitLab?
From: Kristoffer Haugsbakk @ 2024-02-01 17:39 UTC (permalink / raw)
  To: Hans Meiser; +Cc: git@vger.kernel.org, Dragan Simic, Konstantin Ryabitsev
In-Reply-To: <AS2P195MB2135D91EE464FF30EE84E77EE2432@AS2P195MB2135.EURP195.PROD.OUTLOOK.COM>

(Disclaimer that I’m relatively inexperienced with this project
workflow)

My impression is that the email workflow is very flexible and
tool-agnostic.[1] On the other hand it’s hard to get set up in a way
that makes contributing to a project as easy as contributing to a
project that is hosted on GitHub.[2]

† 1: Konstantin’s reply here seems to confirm this. And thanks by the
    way for all your emails on this workflow subject, which I always
    enjoy reading. And for your work on tooling that of course other
    email-based projects than Linux can use.
† 2: With the assumption that you already have an account there

What would really “sell” the email workflow would be to have some sort
of program which can set everything up for you so that you can track
your contributions as easily as a PR on GitHub. Of course people use all
kinds of different platforms, but let’s say that it only was for the
latest Mac OS (this is all hypothetical anyway). All you would need to
do was to give your email credentials and whatever other technical email
things that are required. Just install one program and track all your
patches as well as the replies on them. More concretely: maybe it would
have an email client which would make sure that all your outgoing emails
are done correctly. Including things like not mangling patches in your
reply because of hard-wrapping or something. (I created a support ticket
for that on Fastmail yesterday.) Or: let you immediately inline a
“scissor lines” patch into your current message based on a commit or
just your current working tree.[3]

Also: never having to copy–paste message ids manually. :)

(Again, all hypothetical for the sake of the argument)

This program could be very opinionated and dictate a very rigid
workflow; the point would be that there *is* a way to have a setup which
is as easy as GitHub (modulo email credentials/technical
things). Because then if you want to customize your workflow you are
still totally free to put together your own tools just like what
apparently many people do right now.

If this was even just hypothetically possible—I dunno—then that would be
a strong argument in favor of this kind of project workflow.

I think that would be the best of both worlds.

† 3: That also sounds more convenient than pushing to a GitHub repo. in
    order to make a PR

-- 
Kristoffer Haugsbakk

^ permalink raw reply

* Re: Migrate away from vger to GitHub or (on-premise) GitLab?
From: Nico Williams @ 2024-02-01 17:46 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Hans Meiser, git@vger.kernel.org
In-Reply-To: <20240201-primitive-aardwark-of-contentment-aaabb9@lemur>

On Thu, Feb 01, 2024 at 10:39:04AM -0500, Konstantin Ryabitsev wrote:
> [excellent discussion of e-mail workflows elided]

It would surely help if the e-mail interfaces of forges were not
terrible.  But they really have to be as good as the mailing list
approach.

I envision that the "issues" and "PRs" could be webmail-ish thread
trackers that auto-close on prolonged silence.  One could open issues/
PRs by e-mail, close them by e-mail, etc., all e-mails going to the same
[forge-run?] list address, but still have a forge-style view of a PR's
commits, still have a forge-style code review web UI (with all comments
going to e-mail too, and with e-mail being first-class, not an
afterthought), still have a CI checks UI, and still have a big
rebase-and-merge button for maintainers.

I.e., forge e-mail UI as first-class equivalent of forge web UI.

The forges tend to be run by people who prioritize users who are not
heavy e-mail workflow devs.  It makes economic sense, given how few
users demand e-mail as a first-class forge UI.  Still, it would be quite
awesome if some forge did this.

> - How to avoid a vendor lock-in? [...]

Assuming some forge exists with an e-mail UI on the same footing as its
web UI, and also good enough for kernel/git/... devs, you could maintain
mirrors on all the other forges, naturally, and always fallback on
e-mail only if the primary forge disappears or becomes too expensive.

> - How to avoid centralization and single points of failure? [...]

It's all forks, all the time.  It'd be good if the kernel maintainers
maintained non-forge git servers as mirror/staging/primary repos.

> - How to avoid alienating these hundreds of key maintainers who are now
>   extremely proficient at their query-based workflows? [...]

The only answer is to stick to the current workflow until some forge
provide an equivalently first-class e-mail interface.  New participants
just have to get used to it.  IMO.

Nico
-- 

^ permalink raw reply

* Re: [PATCH v3 03/10] trailer: unify trailer formatting machinery
From: Junio C Hamano @ 2024-02-01 17:48 UTC (permalink / raw)
  To: Linus Arver
  Cc: Josh Steadmon, Linus Arver via GitGitGadget, git,
	Christian Couder, Emily Shaffer, Randall S. Becker
In-Reply-To: <owly1q9x2io6.fsf@fine.c.googlers.com>

Linus Arver <linusa@google.com> writes:

> Josh Steadmon <steadmon@google.com> writes:
>
>> On 2024.01.31 01:22, Linus Arver via GitGitGadget wrote:
>>> This unification will allow us to delete the format_trailer_info() and
>>> print_tok_val() functions in the next patch. They are not deleted here
>>> in order to keep the diff small.
>>
>> Needs to be removed after squashing v2 patch 4 :)
>
> Oops. Will update in next reroll, thanks.

FWIW, by the way, having them in the same patch made it a lot easier
to compare what the original did (with these removed functions) and
what the updated code would do.  When a change is supposed to be a
clean-up of an existing code without changing the behaviour, it helps
to make the before and after versions visible in the patch.

Thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox