Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH v3 0/8] environment: move core config globals into repo_config_values
From: Junio C Hamano @ 2026-06-01 22:24 UTC (permalink / raw)
  To: Bello Olamide
  Cc: git, phillip.wood123, christian.couder, usmanakinyemi202,
	Tian Yuchen, kaartic.sivaraam, me
In-Reply-To: <xmqq8q8y3pjl.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> Your mention of "the next revision" were made on Apr 26th and it has
> been a month without any updates since then.  Tian Yuchen seems to
> have made a few review comments, so perhaps it is a good time to
> update the series to stir the pot, hopefully reigniting interests in
> the topic?

Ah, I see you now have an updated version.  Let's see what people
say on these patches.  Thanks.

^ permalink raw reply

* Re: [PATCH] docs: fix typos and grammar
From: Junio C Hamano @ 2026-06-01 22:23 UTC (permalink / raw)
  To: Weijie Yuan; +Cc: Andrew Kreimer, git
In-Reply-To: <7b502e20e9495cd4720496bd6738a1fbeb453410.1780041658.git.wy@wyuan.org>

Weijie Yuan <wy@wyuan.org> writes:

> Fix several spelling mistakes, subject-verb agreement issues, and
> duplicated words.
>
> Signed-off-by: Weijie Yuan <wy@wyuan.org>
> ---

Sorry, I lost track.

How does this patch relate to the large patch from Andrew that you
reviewed earlier?  Is this meant to replace it, or is it an
independent effort that may or may not overlap what is fixed by the
other patch?  Something else?

Thanks.  All the changes in _this_ patch looked sensible to me (and
to my agent as well ;-).

^ permalink raw reply

* Re: [PATCH 2/2] builtin/init-db: deprecate alias for git-init(1)
From: Junio C Hamano @ 2026-06-01 22:22 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: Patrick Steinhardt, Phillip Wood, git
In-Reply-To: <2e266786-4ccd-4300-9b53-6f13fbaa2933@app.fastmail.com>

"Kristoffer Haugsbakk" <kristofferhaugsbakk@fastmail.com> writes:

>> I found it to be a bit heavy-handed as it's so trivial to replace with
>> git-init(1), but on the other hand it's a trivial thing to do.
>
> I imagine that most potential git-init-db(1) uses will be buried in some
> scripts that haven’t been touched in years. Then the Git init might
> fail, you get errors about git-commit(1) or something not being a thing
> you can run without a repository, and it ends up being a headscratcher
> since the original failure gets lost.
>
> All to say I think a simple warning would be nice. ;)

Or just leave it without deprecation.  It does not cost much to keep
"init-db", and because we expanded what "git database" means in
later versions of Git since its invention, the name still makes
sense.  Thank Linus for not naming it "init-odb"---that might have
been a valid excuse to rename it because it does not cover the ref
database and config database and others.

^ permalink raw reply

* Re: [PATCH 2/2] builtin/init-db: deprecate alias for git-init(1)
From: Junio C Hamano @ 2026-06-01 22:18 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: phillip.wood, Kristoffer Haugsbakk, git
In-Reply-To: <ah2VL-ftCQelNoOc@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> On Mon, Jun 01, 2026 at 02:48:05PM +0100, Phillip Wood wrote:
>> 
>> 
>> On 01/06/2026 13:10, Patrick Steinhardt wrote:
>> > On Mon, Jun 01, 2026 at 11:31:46AM +0200, Kristoffer Haugsbakk wrote:
>> > > On Mon, Jun 1, 2026, at 09:56, Patrick Steinhardt wrote:
>> > > > diff --git a/git.c b/git.c
>> > > > index a72394b599..6bf6a60360 100644
>> > > > --- a/git.c
>> > > > +++ b/git.c
>> > > > @@ -591,7 +591,9 @@ static struct cmd_struct commands[] = {
>> > > >   	{ "hook", cmd_hook, RUN_SETUP_GENTLY },
>> > > >   	{ "index-pack", cmd_index_pack, RUN_SETUP_GENTLY | NO_PARSEOPT },
>> > > >   	{ "init", cmd_init },
>> > > > +#ifndef WITH_BREAKING_CHANGES
>> > > >   	{ "init-db", cmd_init },
>> > > 
>> > > This can be marked as deprecated.
>> > > 
>> > > 	{ "init-db", cmd_init, DEPRECATED },
>> > 
>> > Ah, indeed! Added locally now, thanks.
>> 
>> Deprecating this command seems very sensible to me. As well as marking it
>> deprecated, do we want to print a warning when it is run? I imagine anyone
>> who has this command in their muscle memory is unlikely to be reading the
>> man page on a regular basis so wont see the warning there.
>
> I was wondering whether we want to call `you_still_use_that()` here. I
> found it to be a bit heavy-handed as it's so trivial to replace with
> git-init(1), but on the other hand it's a trivial thing to do.

I personally think you_still_use_that() was a mistake.  Perhaps
log-family of commands were used often enough to warrant it, but not
"init" that takes exactly the same variations of arguments and is
shorter than "init-db".  And you_still_use_that() would not help
scripted use all that much.

^ permalink raw reply

* Re: [PATCH 00/18] odb: make loose object source a proper `struct odb_source`
From: Junio C Hamano @ 2026-06-01 22:14 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <xmqqh5nm3q09.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>> Hi,
>>
>> this patch series converts the loose object source into a proper `struct
>> odb_source` so that it can be used via our generic interfaces.
>>
>> The patch series is relatively straight-forward, as the source basically
>> already exists as such and the interfaces already match. So for most of
>> the part we are just moving around some code and converting functions
>> that were previously called directly into callbacks.
>>
>> I guess the only part that needs some attention is that there is some
>> confusion at first with the `struct odb_source_loose::source` parent
>> pointer that initially points at the owning `struct odb_source_files`.
>> This relationship doesn't make much sense, as a loose source can totally
>> exist standalone without the files source.
>
> No significant comments came in the past week or so on these
> patches.  Should we declare victory, and mark it for 'next'?  I can
> locally amend a typo in [3/18] (<xmqqh5o0zrsr.fsf@gitster.g>).

Ah, I see your reroll.  Perfect.  Let me mark the topic for 'next'
then.

^ permalink raw reply

* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
From: Junio C Hamano @ 2026-06-01 22:13 UTC (permalink / raw)
  To: Colin Stagner
  Cc: Ian Jackson, git, Christian Heusel, george, Christian Hesse,
	Phillip Wood
In-Reply-To: <a1a07433-224e-4477-ae8a-3875fa98faf8@howdoi.land>

Colin Stagner <ask+git@howdoi.land> writes:

> On 4/16/26 08:25, Ian Jackson wrote:
>
>> FTR Debian supports multiple options for /bin/sh.  The shell in
>> question, with the limit that's troubling us, is dash.
>
> Correct, I experience this behavior in dash.
>
>> Why not run the script under bash in non-POSIX mode instead?  I think
>> that would sidestep the problem. 
>
> Our coding guidelines favor POSIX constructs over non-POSIX constructs, 
> including for shell scripts [1]. POSIX helps us stay portable.
>
> I'm not convinced that adding more shell interpreters to the mix would 
> be a net win in terms of stability or consistency. This patch series 
> addresses issues that arise from different implementations of sh. Adding 
> bash vs sh to the mix will probably just make more bugs.
>
>
>> If it had been me I would probably have used Rust and libgit2.
>
> git-subtree has been around since 2009, so you would have first needed 
> to invent Rust. :-) That said, a native Rust version of 
> git-subtree-split would be much faster and easier to read.
>
>
> Thanks for looking at this,
>
> Colin
>
> [1]: https://git-scm.com/docs/CodingGuidelines

So after this message the thread went dark (except for a side
discussion about rewriting subtree in Rust, which I do think it is a
good direction to go in the longer term).  Are we still interested in
polishing the original patch further?

While I do agree that avoiding bash-isms in the main part of Git and
sticking to vanilla POSIX has merit, this particular one seems more
like an artificial limit imposed by dash than sticking to the POSIX
as the common denoninator, at least to me.

I am tempted to mark the topic as stalled, to be discarded for
inaction, but thought I should ask first before doing so.

Thanks.

^ permalink raw reply

* Re: [GSoC][PATCH 0/4] teach git repo info to handle path keys
From: Lucas Seiki Oshiro @ 2026-06-01 22:04 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: git, jltobler, gitster, phillip.wood, sandals, kumarayushjha123,
	a3205153416
In-Reply-To: <20260601151950.30686-1-jayatheerthkulkarni2005@gmail.com>

Nitpick: use [GSoC PATCH] instead of [GSoC][PATCH] as prefix.
Use --subject-prefix='GSoC PATCH' in git-send-email or
git-format-patch or set the configuration variable
`format.subjectPrefix` to that until you finish your GSoC:

$ git config --local format.subjectPrefix 'GSoC PATCH'

^ permalink raw reply

* Re: [GSoC][PATCH 4/4] repo: add path.commondir with absolute and relative suffix formatting
From: Lucas Seiki Oshiro @ 2026-06-01 21:58 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: git, jltobler, gitster, phillip.wood, sandals, kumarayushjha123,
	a3205153416
In-Reply-To: <20260601151950.30686-5-jayatheerthkulkarni2005@gmail.com>


> diff --git a/t/t1900-repo-info.sh b/t/t1900-repo-info.sh
> index 7c7dfbb052..dd2706e1f7 100755
> --- a/t/t1900-repo-info.sh
> +++ b/t/t1900-repo-info.sh
> @@ -184,6 +184,7 @@ test_expect_success 'setup test repository layout for path fields' '
> mkdir -p test-repo/sub
> '
> 
> +test_repo_info_path 'commondir' '../.git'
> test_repo_info_path 'gitdir' '../.git'

I was thinking here, maybe you need to take a look at
git-rev-parse's tests and check what are the corner cases.

For example, `git rev-parse --git-common-dir` documentation
says:

    --git-common-dir:
        Show $GIT_COMMON_DIR if defined, else $GIT_DIR

This way, you should take a look on how git-rev-parse tests
test those two cases (GIT_COMMON_DIR and GIT_DIR) and do
something similar here.


^ permalink raw reply

* Re: [PATCH v3 0/3] line-log: integrate -L with the standard log output pipeline
From: Junio C Hamano @ 2026-06-01 21:53 UTC (permalink / raw)
  To: Ben Knoble; +Cc: Michael Montalbo via GitGitGadget, git, Michael Montalbo
In-Reply-To: <B59BA5B1-184D-48A8-8BAD-11EB6F8EB50C@gmail.com>

Ben Knoble <ben.knoble@gmail.com> writes:

>> Changes since v2:
>> 
>> * Switch "! test_grep" to "test_grep !" in tests.
>
> Thanks ! I did not read the tests carefully for semantic value,
> but the rationale and overall code looks good to me as discussed
> previously.
>
> The range-diff here looks good, too. 

Thanks, both.  Let's mark it for 'next' then.


^ permalink raw reply

* [PATCH v9] revision.c: implement --max-count-oldest
From: Junio C Hamano @ 2026-06-01 21:53 UTC (permalink / raw)
  To: Mirko Faina
  Cc: git, Jeff King, Jean-Noël Avila, Patrick Steinhardt,
	Tian Yuchen, Ben Knoble, Johannes Sixt, Chris Torek
In-Reply-To: <ag3kJ_xKY6584De4@exploit>

Mirko Faina <mroik@delayed.space> writes:

> On Wed, May 20, 2026 at 03:02:34PM +0900, Junio C Hamano wrote:
>> 
>> This breaks CI
>> 
>>   https://github.com/git/git/actions/runs/26138986677/job/76880268854#step:4:2072
>> 
>> Squash something like this to fix.
>>  ...
>
> Sorry about that. And thank you for the fix.

It has been a while, and we saw no further comments by other
reviewers.

Perhaps we should declare a victory and mark the topic for 'next'.

------ >8 ------
From: Mirko Faina <mroik@delayed.space>
Date: Tue, 19 May 2026 02:55:22 +0200

"--max-count" is a commit limiting option and sets a maximum amount
of commits to be shown. If a user wants to see only the first N
commits of the history (the oldest commits) they'd have to do
something like

    git log $(git rev-list HEAD | tail -n N | head -n 1)

This is not very user-friendly.

Teach get_revision() the --max-count-oldest option.

Signed-off-by: Mirko Faina <mroik@delayed.space>
[jc: fixed up t4202 <xmqq7boy4o05.fsf@gitster.g>]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/rev-list-options.adoc |   5 +-
 revision.c                          | 111 +++++++++++++++++++++++++++-
 revision.h                          |   2 +
 t/t4202-log.sh                      |  40 ++++++++++
 4 files changed, 154 insertions(+), 4 deletions(-)

diff --git a/Documentation/rev-list-options.adoc b/Documentation/rev-list-options.adoc
index 2d195a1474..e8c88d0f1c 100644
--- a/Documentation/rev-list-options.adoc
+++ b/Documentation/rev-list-options.adoc
@@ -16,7 +16,10 @@ ordering and formatting options, such as `--reverse`.
 `-<number>`::
 `-n <number>`::
 `--max-count=<number>`::
-	Limit the output to _<number>_ commits.
+	Limit the output to the first _<number>_ commits that would be shown.
+
+`--max-count-oldest=<number>`::
+	Limit the output to the last _<number>_ commits that would be shown.
 
 `--skip=<number>`::
 	Skip _<number>_ commits before starting to show the commit output.
diff --git a/revision.c b/revision.c
index 599b3a66c3..5d53db3152 100644
--- a/revision.c
+++ b/revision.c
@@ -2339,10 +2339,28 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	}
 
 	if ((argcount = parse_long_opt("max-count", argv, &optarg))) {
+		if (revs->max_count_type == 1)
+			die_for_incompatible_opt2(1, "--max-count", 1,
+						  "--max-count-oldest");
 		revs->max_count = parse_count(optarg);
 		revs->no_walk = 0;
+		revs->max_count_type = 0;
 		return argcount;
+	} else if ((argcount = parse_long_opt("max-count-oldest", argv, &optarg))) {
+		if (revs->max_count_type == 0 && revs->max_count != -1)
+			die_for_incompatible_opt2(1, "--max-count", 1,
+						  "--max-count-oldest");
+		if (revs->skip_count > 0)
+			die_for_incompatible_opt2(1, "--skip", 1,
+						  "--max-count-oldest");
+		revs->max_count = parse_count(optarg);
+		revs->no_walk = 0;
+		revs->max_count_type = 1;
+		revs->max_count_stage = 0;
 	} else if ((argcount = parse_long_opt("skip", argv, &optarg))) {
+		if (revs->max_count_type == 1)
+			die_for_incompatible_opt2(1, "--skip", 1,
+						  "--max-count-oldest");
 		revs->skip_count = parse_count(optarg);
 		return argcount;
 	} else if ((*arg == '-') && isdigit(arg[1])) {
@@ -4521,15 +4539,91 @@ static struct commit *get_revision_internal(struct rev_info *revs)
 	return c;
 }
 
+static void retrieve_oldest_commits(struct rev_info *revs,
+				    struct commit_list **queue)
+{
+	struct commit *c;
+	int max_count = revs->max_count;
+	int queuei_count = 0;
+	int queueo_count = 0;
+	struct commit_list *queueo = NULL;
+	struct commit_list *queuei = NULL;
+	struct commit_list *reversed_queue = NULL;
+	struct commit_list *p;
+
+	revs->max_count = -1;
+	while ((c = get_revision_internal(revs))) {
+		/*
+		 * We need to reset SHOWN status otherwise --graph breaks.
+		 * It is fine to do, get_revision_internal() doesn't consider
+		 * children commits as they have been already processed and the
+		 * traversal happens only child to parent.
+		 *
+		 * We do this because the --graph machinery relies on the status
+		 * of the parents to decide how the printing will happen.
+		 *
+		 * We can't simply replace this instruction with a
+		 * graph_update() as it doesn't do the actualy printing, we'd
+		 * have to remove any commit that goes over the
+		 * --max-count-oldest limit from revs->graph.
+		 */
+		c->object.flags &= ~(SHOWN | CHILD_SHOWN);
+		commit_list_insert(c, &queuei);
+		if (!(c->object.flags & BOUNDARY))
+			queuei_count++;
+		while (queuei_count + queueo_count > max_count) {
+			if (!queueo_count) {
+				while ((c = pop_commit(&queuei))) {
+					commit_list_insert(c, &queueo);
+					queueo_count++;
+				}
+				queuei_count = 0;
+			}
+			c = pop_commit(&queueo);
+			queueo_count--;
+			/* We need to do this otherwise we'll discard the
+			 * commits that go over the --max-count-oldest limit but
+			 * not their respective boundaries. This matters only if
+			 * we're discarding the commit right before the boundary.
+			 */
+			for (p = c->parents; p; p = p->next)
+				p->item->object.flags &= ~CHILD_SHOWN;
+		}
+	}
+
+	while ((c = pop_commit(&queueo)))
+		commit_list_insert(c, &reversed_queue);
+	while ((c = pop_commit(&queuei)))
+		commit_list_insert(c, &queueo);
+	while ((c = pop_commit(&queueo)))
+		commit_list_insert(c, &reversed_queue);
+
+	while ((c = pop_commit(&reversed_queue)))
+		commit_list_insert(c, queue);
+}
+
 struct commit *get_revision(struct rev_info *revs)
 {
 	struct commit *c;
 	struct commit_list *reversed;
+	struct commit_list *queue = NULL;
+	struct commit_list *p;
+
+	if (revs->max_count_type == 1 && !revs->max_count_stage) {
+		retrieve_oldest_commits(revs, &queue);
+		commit_list_free(revs->commits);
+		revs->commits = queue;
+		revs->max_count_stage = 1;
+	}
 
 	if (revs->reverse) {
 		reversed = NULL;
-		while ((c = get_revision_internal(revs)))
-			commit_list_insert(c, &reversed);
+		if (revs->max_count_type == 1)
+			while ((c = pop_commit(&revs->commits)))
+				commit_list_insert(c, &reversed);
+		else
+			while ((c = get_revision_internal(revs)))
+				commit_list_insert(c, &reversed);
 		commit_list_free(revs->commits);
 		revs->commits = reversed;
 		revs->reverse = 0;
@@ -4543,7 +4637,18 @@ struct commit *get_revision(struct rev_info *revs)
 		return c;
 	}
 
-	c = get_revision_internal(revs);
+	if (revs->max_count_stage) {
+		c = pop_commit(&revs->commits);
+		if (c) {
+			c->object.flags |= SHOWN;
+			if (!(c->object.flags & BOUNDARY))
+				for (p = c->parents; p; p = p->next)
+					p->item->object.flags |= CHILD_SHOWN;
+		}
+	} else {
+		c = get_revision_internal(revs);
+	}
+
 	if (c && revs->graph)
 		graph_update(revs->graph, c);
 	if (!c) {
diff --git a/revision.h b/revision.h
index 584f1338b5..e157463cb1 100644
--- a/revision.h
+++ b/revision.h
@@ -309,6 +309,8 @@ struct rev_info {
 	/* special limits */
 	int skip_count;
 	int max_count;
+	unsigned int max_count_type:1;
+	unsigned int max_count_stage:1;
 	timestamp_t max_age;
 	timestamp_t max_age_as_filter;
 	timestamp_t min_age;
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index 05cee9e41b..75edb0eb38 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -1882,6 +1882,46 @@ test_expect_success 'log --graph with --name-status' '
 	test_cmp_graph --name-status tangle..reach
 '
 
+test_expect_success 'log --max-count-oldest=3 --oneline' '
+	test_when_finished rm expect &&
+	git log --oneline | tail -n3 >expect &&
+	git log --oneline --max-count-oldest=3 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'log --max-count-oldest=3 --reverse --oneline' '
+	test_when_finished rm expect &&
+	git log --oneline --reverse | head -n3 >expect &&
+	git log --oneline --max-count-oldest=3 --reverse >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'log --max-count-oldest with --max-count' '
+	test_when_finished rm stderr &&
+	test_must_fail git log --max-count-oldest=3 --max-count=3 2>stderr &&
+	test_grep "cannot be used together" stderr
+'
+
+test_expect_success 'log --max-count-oldest with --skip' '
+	test_when_finished rm stderr &&
+	test_must_fail git log --max-count-oldest=3 --skip=1 2>stderr &&
+	test_grep "cannot be used together" stderr
+'
+
+test_expect_success 'log --max-count-oldest=1000 --graph --boundary' '
+	test_when_finished rm expect actual &&
+	git log --graph --boundary >expect &&
+	git log --max-count-oldest=1000 --graph --boundary >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'log --oneline --graph --boundary --max-count-oldest=1' '
+	test_when_finished rm -f actual &&
+	git log --oneline --graph --boundary --max-count-oldest=1 \
+		HEAD~1..HEAD >actual &&
+	test_line_count = 2 actual
+'
+
 cat >expect <<-\EOF
 * reach
 |
-- 
2.54.0-514-g9d901a57fc


^ permalink raw reply related

* Re: Missing Git Features for Modern Multi-Repository, Dependency-Driven Development
From: Skybuck Flying @ 2026-06-01 21:48 UTC (permalink / raw)
  To: Git
In-Reply-To: <AM0PR02MB445082932A5ED69B5F6EA782B3152@AM0PR02MB4450.eurprd02.prod.outlook.com>

Point ⭐ 3. **Built-In Provenance Tracking** 
 Sub Point:  Fork lineage 

^ Also needs some further clarification, here is:

---

# ⭐ **SPECIFICATION: Built‑In Provenance Tracking for Git (Enhanced Edition)**

Modern software development depends on understanding where code comes from, how it evolves, and how repositories relate to each other. Git tracks commits, but it does **not** track repositories.  
This leaves a massive blind spot in provenance, security, and multi‑repo tooling.

This document proposes a minimal, backward‑compatible metadata system that gives Git true repository‑level lineage and multi‑generation fork divergence tracking.

Sections:

111. WHAT GIT SHOULD HAVE BEEN  
222. RFC / PROPOSAL  
333. IMPLEMENTATION DETAILS  
444. EXAMPLES  

---

# ⭐ **111. WHAT GIT SHOULD HAVE BEEN — Provenance Tracking**

Git is excellent at tracking *content*, but it is completely blind to *where repositories come from*.  
This leads to long‑standing problems:

- forks lose their origin  
- mirrors cannot be detected  
- renames erase history  
- hosting migrations break lineage  
- dependency tools cannot trace ancestry  
- security tools cannot identify upstream  
- multi‑repo systems cannot reason about relationships  

Git treats every clone as an isolated universe.

That is fundamentally wrong.

Git should have tracked:

### ✔ Original upstream  
### ✔ Fork lineage  
### ✔ Migration history  
### ✔ Renames  
### ✔ Moves  
### ✔ Repository identity  

---

## ⭐ **Basic Idea (3–5 lines)**  
Every repository has a UUID.  
Every fork stores the UUID of the repo it was forked from.  
That parent does **not** need to be the true origin — it can be a fork of a fork.  
Git walks these UUID links to reconstruct the entire ancestry chain.  
This gives Git real provenance for the first time.

---

Git should have been able to answer:

- “Where did this repo come from.”  
- “What is its upstream.”  
- “How many generations deep is this fork.”  
- “How far behind upstream is it — at every level.”  
- “What is the full lineage tree.”  

Today Git cannot answer any of these.

---

# ⭐ **222. RFC: Provenance Metadata for Git**

## **1. Introduction**

Git repositories lack built‑in provenance metadata.  
This prevents Git from understanding:

- fork relationships  
- upstream lineage  
- migration history  
- renames and moves  
- mirrors  
- divergence depth  

This RFC proposes a minimal, optional metadata file that records a repository’s **parent identity** and **last upstream sync**.

---

## **2. Problem Statement**

Git currently cannot:

- detect forks  
- detect mirrors  
- detect renames  
- detect hosting migrations  
- compute fork depth  
- compute divergence from upstream  
- reconstruct ancestry  
- track provenance across platforms  

This causes:

- lost history  
- broken tooling  
- ambiguous security metadata  
- dependency confusion  
- inability to reason about multi‑repo graphs  

---

## **3. Proposed Feature: `.gitorigin`**

Introduce a file:

```
.gitorigin
```

Containing:

```
origin_uuid = "<parent-repo-uuid>"
last_synced = "<commit-hash>"
```

### **Properties**

- stored in the repo  
- points to the repo this one was forked from  
- parent does NOT need to be the true origin  
- supports multi‑generation fork chains  
- supports migration history  
- supports renames and moves  
- supports mirrors  

### **Benefits**

- Git can reconstruct full lineage  
- Git can compute divergence  
- Git can detect mirrors  
- Git can detect lost forks  
- Git can track upstream sync  
- Git can show ancestry trees  
- Git can support provenance‑aware tooling  

---

## **4. Relationship to `.git/identity`**

Each repo has:

```
.git/identity
uuid = "<repo-uuid>"
```

Each fork has:

```
.gitorigin
origin_uuid = "<parent-uuid>"
```

Together, these form a **repository‑level DAG**.

---

# ⭐ **333. IMPLEMENTATION DETAILS (Enhanced)**

This section describes how Git and hosting providers can implement fork lineage tracking — including the part you found most impressive:

> **Git can compute “commits behind” for every fork in the chain.**

---

## **3.1. Fork Creation**

When a user forks a repo:

1. The new repo generates its own UUID  
2. The new repo writes:

```
.gitorigin
origin_uuid = "<uuid-of-parent>"
last_synced = "<current-upstream-commit>"
```

This works even if:

- the parent is itself a fork  
- the parent is a mirror  
- the parent has moved hosts  
- the parent has been renamed  

---

## ⭐ **3.2. Multi‑Generation Fork Chains WITH Divergence Tracking**

Let’s illustrate this clearly.

### **Repository chain:**

```
Origin (O)
  ↓ forked by
Fork A (A)
  ↓ forked by
Fork B (B)
  ↓ forked by
Fork C (C)
```

### **Commit history:**

```
Origin: 100 commits
Fork A:  +5 commits (105 total)
Fork B:  +3 commits (108 total)
Fork C:  +7 commits (115 total)
```

### **Upstream sync points:**

Fork A synced at commit 100  
Fork B synced at commit 105  
Fork C synced at commit 108  

### **Git can compute:**

#### **Fork A**
- Ahead of Origin: +5  
- Behind Origin: 0  

#### **Fork B**
- Ahead of A: +3  
- Behind A: 0  
- Behind Origin: 5  

#### **Fork C**
- Ahead of B: +7  
- Behind B: 0  
- Behind A: 3  
- Behind Origin: 8  

Git can now show:

```
C is 7 commits ahead of B
C is 3 commits behind A
C is 8 commits behind Origin
```

This is the part you loved — and yes, it’s absolutely possible.

---

## **3.3. Migration History**

If a repo moves:

- GitHub → GitLab  
- user → organization  
- mirror → new host  

The UUID stays the same.  
The `.gitorigin` stays the same.

Lineage is preserved.

---

## **3.4. Mirror Detection**

If two repos share the same UUID:

```
uuid = X
uuid = X
```

They are mirrors.

Git can warn:

> “These repositories are identical mirrors.”

---

## **3.5. Lost Fork Recovery**

If someone clones a fork and pushes it elsewhere:

Git can still detect:

> “This repo is a descendant of O.”

Because the `.gitorigin` chain is intact.

---

## **3.6. Backward Compatibility**

- Repos without `.gitorigin` behave normally  
- Tools ignoring provenance behave normally  
- No protocol changes  
- No breaking changes  

This is purely additive.

---

# ⭐ **444. EXAMPLES — Fork Lineage in Practice (Enhanced)**

## **Example 1: Fork → Fork → Fork with Divergence**

```
Origin → Fork A → Fork B → Fork C
```

Git reconstructs:

```
C → B → A → Origin
```

Git computes:

```
C is 7 ahead of B
C is 3 behind A
C is 8 behind Origin
```

This is impossible today.

---

## **Example 2: Fork Becomes the New Upstream**

Origin is abandoned.  
Fork B becomes the new mainline.

Fork C updates:

```
origin_uuid = B
```

Git still knows:

```
C → B → A → Origin
```

This is migration history.

---

## **Example 3: Mirror Detection**

Two repos have the same UUID:

```
uuid = "abc"
uuid = "abc"
```

Git knows:

> “These are mirrors.”

---

## **Example 4: Hosting Migration**

Repo moves:

```
github.com/user/yaml → gitlab.com/sky/yaml
```

UUID stays the same.  
`.gitorigin` stays the same.  
Lineage stays intact.

---

## **Example 5: Lost Fork Recovery**

Someone clones Fork B and pushes it to a new host.

Git reads:

```
origin_uuid = A
```

Git reconstructs:

```
NewRepo → B → A → Origin
```

Lineage recovered.

---

Bye for now,  
  Skybuck Flying / Harald Houppermans ! ;) =D XD

^ permalink raw reply

* Re: [PATCH v3 0/8] environment: move core config globals into repo_config_values
From: Junio C Hamano @ 2026-06-01 21:43 UTC (permalink / raw)
  To: Bello Olamide
  Cc: git, phillip.wood123, christian.couder, usmanakinyemi202,
	Tian Yuchen, kaartic.sivaraam, me
In-Reply-To: <xmqqlddqu013.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> Bello Olamide <belkid98@gmail.com> writes:
>
>> There isn’t any semantic difference intended between
>> the "environment:" and "env:" prefixes
>>
>> I shortened some of them to stay within the recommended subject length,
>> but on a second thought I agree that consistency is more important here.
>>
>> I’ll standardize them in the next revision.
>
> Does anybody listed on the CC: in the original submission have any
> comments on this round?  It seems that v2 iteration was commented on
> quite a bit, but has anybody checked the latest iteration since it
> was posted?

Your mention of "the next revision" were made on Apr 26th and it has
been a month without any updates since then.  Tian Yuchen seems to
have made a few review comments, so perhaps it is a good time to
update the series to stir the pot, hopefully reigniting interests in
the topic?

https://lore.kernel.org/git/08efcc49-0db8-49f6-8971-633aa55eb66c@malon.dev/

^ permalink raw reply

* [PATCH] doc: document and test `@` prefix for raw timestamps
From: Luna Schwalbe @ 2026-06-01 21:39 UTC (permalink / raw)
  To: git; +Cc: Luna Schwalbe, Junio C Hamano

The Git internal date format `<unix-timestamp> <time-zone-offset>`
fails to parse when the timestamp is less than 100,000,000 (fewer than
9 digits). This happens to avoid potential ambiguity with other date
formats such as `YYYYMMDD`, especially when used with approxidate.

To force the parser to interpret the value as a raw timestamp, it must
be prefixed with `@` (e.g., `@0 +0000`). This behavior was introduced
in 2c733fb24c10a9d7aacc51f956bf9b7881980870 (parse_date(): '@' prefix
forces git-timestamp, 2012-02-02) but was never documented.

Document the `@` prefix in `Documentation/date-formats.adoc` to make
this behavior explicit. Also add test cases to `t/t0006-date.sh` to
verify and demonstrate the difference between prefixed and unprefixed
small timestamps (e.g., `@2000` vs `2000`).

Signed-off-by: Luna Schwalbe <dev@luna.gl>
Co-authored-by: Junio C Hamano <gitster@pobox.com>
---
I switched out the YYYYMMDD tests as that format doesn't appear to be
understood by either parse or approxidate.

 Documentation/date-formats.adoc |  6 ++++++
 t/t0006-date.sh                 | 11 +++++++++++
 2 files changed, 17 insertions(+)

diff --git a/Documentation/date-formats.adoc b/Documentation/date-formats.adoc
index e24517c49..83f676585 100644
--- a/Documentation/date-formats.adoc
+++ b/Documentation/date-formats.adoc
@@ -10,6 +10,12 @@ Git internal format::
 	`<time-zone-offset>` is a positive or negative offset from UTC.
 	For example CET (which is 1 hour ahead of UTC) is `+0100`.
 
+    It is safer to prepend the `<unix-timestamp>` with `@`
+    (e.g., `@0 +0000`), which forces Git to interpret it as a raw
+    timestamp. This is required for values less than 100,000,000
+    (which have fewer than 9 digits) to avoid confusion with other
+    date formats (like `YYYYMMDD`).
+
 RFC 2822::
 	The standard date format as described by RFC 2822, for example
 	`Thu, 07 Apr 2005 22:13:13 +0200`.
diff --git a/t/t0006-date.sh b/t/t0006-date.sh
index 53ced36df..8b4e1870b 100755
--- a/t/t0006-date.sh
+++ b/t/t0006-date.sh
@@ -138,6 +138,13 @@ check_parse '1969-12-31 23:59:59 Z' bad
 check_parse '1969-12-31 23:59:59 +11' bad
 check_parse '1969-12-31 23:59:59 -11' bad
 
+# pathologically small timestamps requiring `@` prefix
+check_parse '@0 +0000' '1970-01-01 00:00:00 +0000'
+check_parse '@99999999 +0000' '1973-03-03 09:46:39 +0000'
+check_parse '99999999 +0000' bad
+check_parse '@100000000 +0000' '1973-03-03 09:46:40 +0000'
+check_parse '100000000 +0000' '1973-03-03 09:46:40 +0000'
+
 REQUIRE_64BIT_TIME=HAVE_64BIT_TIME
 check_parse '2099-12-31 23:59:59' '2099-12-31 23:59:59 +0000'
 check_parse '2099-12-31 23:59:59 +00' '2099-12-31 23:59:59 +0000'
@@ -195,6 +202,10 @@ check_approxidate '6AM, June 7, 2009' '2009-06-07 06:00:00'
 check_approxidate '2008-12-01' '2008-12-01 19:20:00'
 check_approxidate '2009-12-01' '2009-12-01 19:20:00'
 
+# ambiguous raw timestamp
+check_approxidate '2000 +0000' '2000-08-30 19:20:00'
+check_approxidate '@2000 +0000' '1970-01-01 00:33:20'
+
 check_date_format_human() {
 	t=$(($GIT_TEST_DATE_NOW - $1))
 	echo "$t -> $2" >expect
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v2] stash: reuse cached index entries in --patch temporary index
From: Junio C Hamano @ 2026-06-01 21:33 UTC (permalink / raw)
  To: Adam Johnson via GitGitGadget
  Cc: git, Thomas Gummerer, Elijah Newren, Phillip Wood, Victoria Dye,
	Adam Johnson
In-Reply-To: <pull.2306.v2.git.git.1779491545531.gitgitgadget@gmail.com>

"Adam Johnson via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Adam Johnson <me@adamj.eu>
>
> `git stash -p` prepares the interactive selection by creating a
> temporary index at HEAD, switching `GIT_INDEX_FILE` to it, and then
> running the `add -p` machinery.
>
> That temporary index was created by running `git read-tree HEAD`.  The
> resulting index had no useful cached stat data or fsmonitor-valid bits
> from the real index.  When `run_add_p()` refreshed that temporary index
> before showing the first prompt, it could end up lstat(2)-ing every
> tracked file, even in a repository where `git diff` and `git restore -p`
> can use fsmonitor to avoid that work.
>
> Create the temporary index in-process instead.  Use `unpack_trees()` to
> reset the real index contents to HEAD while writing the result to the
> temporary index path.  For paths whose index entries already match HEAD,
> `oneway_merge()` reuses the existing cache entries, preserving their
> cached stat data and `CE_FSMONITOR_VALID` state.
>
> This makes the refresh performed by `run_add_p()` behave like the one
> used by `git restore -p`: unchanged paths can be skipped via fsmonitor
> instead of being scanned again.
>
> In a 206k file repository with `core.fsmonitor` enabled and a one-line
> change in one file, time to first prompt dropped from 34.774 seconds to
> 0.659 seconds. The new perf test file demonstrates similar improvements,
> with maen times for without- and with-fsmonitor cases dropping from 6.90
> and 6.83 seconds to 0.55 and 0.28 seconds, respectively.
>
> Signed-off-by: Adam Johnson <me@adamj.eu>
> ---
>     stash: reuse cached index entries in --patch temporary index
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2306%2Fadamchainz%2Faj%2Foptimize-stash-patch-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2306/adamchainz/aj/optimize-stash-patch-v2
> Pull-Request: https://github.com/git/git/pull/2306

The diff relative to the previous round looked good.  I am not a
"stash -p" user myself, but I suspect that there are people who
heavily use it, so I'd feel safer if an extra set of eye looks at
the patch and gives an Ack, but other than that I have no comments
on the patch.  Looking good.

Thanks.


>  builtin/stash.c             | 70 +++++++++++++++++++++++++++++++++----
>  t/perf/p3904-stash-patch.sh | 43 +++++++++++++++++++++++
>  2 files changed, 107 insertions(+), 6 deletions(-)
>  create mode 100755 t/perf/p3904-stash-patch.sh
>
> diff --git a/builtin/stash.c b/builtin/stash.c
> index 32dbc97b47..c4809f299a 100644
> --- a/builtin/stash.c
> +++ b/builtin/stash.c
> @@ -372,6 +372,56 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
>  	return 0;
>  }
>  
> +static int create_index_from_tree(const struct object_id *tree_id,
> +				  const char *index_path)
> +{
> +	int nr_trees = 1;
> +	int ret = 0;
> +	struct unpack_trees_options opts;
> +	struct tree_desc t[MAX_UNPACK_TREES];
> +	struct tree *tree;
> +	struct index_state dst_istate = INDEX_STATE_INIT(the_repository);
> +	struct lock_file lock_file = LOCK_INIT;
> +
> +	repo_read_index_preload(the_repository, NULL, 0);
> +	refresh_index(the_repository->index, REFRESH_QUIET, NULL, NULL, NULL);
> +
> +	hold_lock_file_for_update(&lock_file, index_path, LOCK_DIE_ON_ERROR);
> +
> +	memset(&opts, 0, sizeof(opts));
> +
> +	tree = repo_parse_tree_indirect(the_repository, tree_id);
> +	if (!tree || repo_parse_tree(the_repository, tree)) {
> +		ret = -1;
> +		goto done;
> +	}
> +
> +	init_tree_desc(t, &tree->object.oid, tree->buffer, tree->size);
> +
> +	opts.head_idx = 1;
> +	opts.src_index = the_repository->index;
> +	opts.dst_index = &dst_istate;
> +	opts.merge = 1;
> +	opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
> +	opts.fn = oneway_merge;
> +
> +	if (unpack_trees(nr_trees, t, &opts)) {
> +		ret = -1;
> +		goto done;
> +	}
> +
> +	if (write_locked_index(&dst_istate, &lock_file, COMMIT_LOCK)) {
> +		ret = error(_("unable to write new index file"));
> +		goto done;
> +	}
> +
> +done:
> +	release_index(&dst_istate);
> +	if (ret)
> +		rollback_lock_file(&lock_file);
> +	return ret;
> +}
> +
>  static int diff_tree_binary(struct strbuf *out, struct object_id *w_commit)
>  {
>  	struct child_process cp = CHILD_PROCESS_INIT;
> @@ -1321,18 +1371,26 @@ static int stash_patch(struct stash_info *info, const struct pathspec *ps,
>  		       struct interactive_options *interactive_opts)
>  {
>  	int ret = 0;
> -	struct child_process cp_read_tree = CHILD_PROCESS_INIT;
>  	struct child_process cp_diff_tree = CHILD_PROCESS_INIT;
> +	struct commit *head_commit;
> +	const struct object_id *head_tree;
>  	struct index_state istate = INDEX_STATE_INIT(the_repository);
>  	char *old_index_env = NULL, *old_repo_index_file;
>  
>  	remove_path(stash_index_path.buf);
>  
> -	cp_read_tree.git_cmd = 1;
> -	strvec_pushl(&cp_read_tree.args, "read-tree", "HEAD", NULL);
> -	strvec_pushf(&cp_read_tree.env, "GIT_INDEX_FILE=%s",
> -		     stash_index_path.buf);
> -	if (run_command(&cp_read_tree)) {
> +	head_commit = lookup_commit(the_repository, &info->b_commit);
> +	if (!head_commit || repo_parse_commit(the_repository, head_commit)) {
> +		ret = -1;
> +		goto done;
> +	}
> +	head_tree = get_commit_tree_oid(head_commit);
> +	if (!head_tree) {
> +		ret = -1;
> +		goto done;
> +	}
> +
> +	if (create_index_from_tree(head_tree, stash_index_path.buf)) {
>  		ret = -1;
>  		goto done;
>  	}
> diff --git a/t/perf/p3904-stash-patch.sh b/t/perf/p3904-stash-patch.sh
> new file mode 100755
> index 0000000000..4cfce638be
> --- /dev/null
> +++ b/t/perf/p3904-stash-patch.sh
> @@ -0,0 +1,43 @@
> +#!/bin/sh
> +
> +test_description="Performance tests for git stash -p"
> +
> +. ./perf-lib.sh
> +
> +test_perf_fresh_repo
> +
> +test_expect_success "setup" '
> +	mkdir files &&
> +	test_seq 1 100000 | while read i; do
> +		echo "content $i" >files/$i.txt || return 1
> +	done &&
> +	git add files/ &&
> +	git commit -q -m "add tracked files" &&
> +	echo modified >files/1.txt
> +'
> +
> +test_perf "stash -p, no fsmonitor" \
> +	--setup 'echo modified >files/1.txt' '
> +	printf "q\n" | git stash -p >/dev/null 2>&1 || true
> +'
> +
> +if test_have_prereq FSMONITOR_DAEMON
> +then
> +	test_expect_success "enable builtin fsmonitor" '
> +		git config core.fsmonitor true &&
> +		git fsmonitor--daemon start &&
> +		git update-index --fsmonitor &&
> +		git status >/dev/null 2>&1
> +	'
> +
> +	test_perf "stash -p, builtin fsmonitor" \
> +		--setup 'echo modified >files/1.txt && git status >/dev/null 2>&1' '
> +		printf "q\n" | git stash -p >/dev/null 2>&1 || true
> +	'
> +
> +	test_expect_success "stop builtin fsmonitor" '
> +		git fsmonitor--daemon stop
> +	'
> +fi
> +
> +test_done
>
> base-commit: 7bcaabddcf68bd0702697da5904c3b68c52f94cf

^ permalink raw reply

* Re: [PATCH 00/18] odb: make loose object source a proper `struct odb_source`
From: Junio C Hamano @ 2026-06-01 21:33 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <20260521-b4-pks-odb-source-loose-v1-0-6553b399be2d@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
> this patch series converts the loose object source into a proper `struct
> odb_source` so that it can be used via our generic interfaces.
>
> The patch series is relatively straight-forward, as the source basically
> already exists as such and the interfaces already match. So for most of
> the part we are just moving around some code and converting functions
> that were previously called directly into callbacks.
>
> I guess the only part that needs some attention is that there is some
> confusion at first with the `struct odb_source_loose::source` parent
> pointer that initially points at the owning `struct odb_source_files`.
> This relationship doesn't make much sense, as a loose source can totally
> exist standalone without the files source.

No significant comments came in the past week or so on these
patches.  Should we declare victory, and mark it for 'next'?  I can
locally amend a typo in [3/18] (<xmqqh5o0zrsr.fsf@gitster.g>).

^ permalink raw reply

* Re: Missing Git Features for Modern Multi-Repository, Dependency-Driven Development
From: Skybuck Flying @ 2026-06-01 21:32 UTC (permalink / raw)
  To: Git
In-Reply-To: <AM0PR02MB4450F6AF2F662C51F3145B48B3152@AM0PR02MB4450.eurprd02.prod.outlook.com>

Point 7 needs further explaining and this document will do so:

7. **Stable repository identity**

---

# ⭐ **SPECIFICATION: Stable Repository Identity for Git**

Modern software projects increasingly rely on large dependency graphs, multi‑repository structures, reproducible builds, and long‑term provenance. Git provides excellent version control but lacks native mechanisms for these workflows.  
This specification outlines optional, backward‑compatible metadata extensions that would allow Git to better support modern development practices.

This document contains four sections:

111. WHAT GIT SHOULD HAVE BEEN  
222. RFC / PROPOSAL  
333. IMPLEMENTATION IDEAS / DETAILS  
444. EXAMPLES  

---

# ⭐ **111. WHAT GIT SHOULD HAVE BEEN — Stable Repository Identity**

Git is brilliant at what it was designed for:

- content‑addressable storage  
- distributed history  
- immutable snapshots  

But Git has one deep architectural flaw:

> **A repository’s identity = its URL.**

This has caused 15+ years of breakage across ecosystems:

- Go import paths break when repos move  
- mirrors confuse tooling  
- forks lose provenance  
- dependency manifests rot  
- security advisories become invalid  
- organizational migrations break everything  
- renames break imports and builds  

Git treats the *transport location* as the *identity*.  
This is backwards.

---

## ⭐ **Basic Idea (3–5 lines)**  
Git breaks when a repository’s URL changes because the URL *is* the identity.  
The fix is to give every repo a permanent UUID stored in `.git/identity`.  
Tools then import using a stable name like `sky/yaml`, which Git maps to the UUID.  
If the repo moves, renames, or changes hosting, only the mapping updates — **the code stays the same**.

---

Git should have had:

### ✔ A stable, permanent identity  
### ✔ Independent of URL  
### ✔ Independent of hosting provider  
### ✔ Independent of username  
### ✔ Independent of organization  
### ✔ Independent of mirrors and forks  

This identity should have been:

- generated once  
- stored inside the repo  
- immutable  
- portable  
- cryptographically strong  

Something like:

```
.git/identity
uuid = "d8f1-9c2e-44b1-8f3a-abc123"
```

This would have allowed:

- renaming  
- moving  
- mirroring  
- forking  
- reorganizing  
- migrating hosts  

**without breaking anything.**

Git should have been built on **stable identity**, not URLs.

---

# ⭐ **222. RFC: Stable Repository Identity for Git**

## **1. Introduction**

Git repositories today are identified by their URLs.  
This creates long‑term fragility in:

- dependency management  
- import paths  
- provenance tracking  
- fork lineage  
- security metadata  
- multi‑repo systems  
- organizational migrations  

This RFC proposes a minimal, optional, backward‑compatible mechanism for assigning **stable identities** to Git repositories.

---

## **2. Problem Statement**

Git currently lacks:

- a persistent repository identity  
- a way to track renames  
- a way to track hosting moves  
- a way to track mirrors  
- a way to track forks  
- a way to track upstream provenance  
- a way to reference repositories independent of URLs  

As a result:

- Go import paths break  
- dependency manifests rot  
- security advisories become invalid  
- forks lose their origin  
- mirrors cannot be recognized  
- tools cannot detect duplicates  
- organizations cannot reorganize safely  

Git’s URL‑based identity model is insufficient for modern software engineering.

---

## **3. Proposed Feature: `.git/identity`**

Introduce a file:

```
.git/identity
```

Containing:

```
uuid = "<128-bit UUID>"
```

### **Properties**

- generated once at `git init`  
- immutable  
- portable  
- stored inside the repository  
- independent of hosting  
- independent of remotes  
- independent of URLs  
- independent of usernames  

### **Benefits**

- stable identity across renames  
- stable identity across mirrors  
- stable identity across forks  
- stable identity across hosting providers  
- stable identity across organizational changes  

---

## **4. Use Cases**

### **4.1. Import Path Stability**

Tools and languages can reference:

```
uuid:d8f1-9c2e-44b1-8f3a-abc123
```

instead of:

```
github.com/user/project
```

Renames no longer break imports.

---

### **4.2. Dependency Manifest Stability**

Manifests can store:

```
yaml = "uuid:d8f1-9c2e-44b1-8f3a-abc123"
```

instead of URLs.

This prevents dependency rot.

---

### **4.3. Provenance Tracking**

Forks can store:

```
origin_uuid = "d8f1-9c2e-44b1-8f3a-abc123"
```

Git can detect:

- upstream  
- divergence  
- last sync  
- fork lineage  

---

### **4.4. Security Metadata Stability**

Security advisories can reference UUIDs instead of URLs.

This prevents advisories from breaking when repos move.

---

## **5. Backward Compatibility**

- optional  
- ignored by older Git versions  
- does not affect existing workflows  
- does not change commit formats  
- does not change remotes  
- does not change URLs  
- does not break anything  

---

## **6. Conclusion**

A stable repository identity is a minimal, optional enhancement that solves long‑standing structural issues in Git’s architecture. It enables robust dependency management, provenance tracking, security metadata, and multi‑repo tooling.

---

# ⭐ **333. IMPLEMENTATION IDEAS / DETAILS**

This section describes how hosting providers (GitHub, GitLab, Gitea, Bitbucket, self‑hosted servers) could implement and expose stable repository identities using the proposed `.git/identity` file.

---

## **3.1. Repository UUID Storage**

When a repository is pushed, the hosting provider reads:

```
.git/identity
uuid = "<128-bit UUID>"
```

The platform stores this UUID in its internal metadata database.

If the file does not exist, the platform may:

- generate a UUID on first push, or  
- leave the field empty (backward compatibility)  

---

## **3.2. URL Structure and Access**

### **Human‑friendly URLs remain unchanged**

```
https://github.com/SkybuckFlying/yaml
https://gitlab.com/sky/yaml
```

### **Optional UUID‑based URLs**

```
https://github.com/uuid/d8f1-9c2e-44b1-8f3a-abc123
https://gitlab.com/uuid/d8f1-9c2e-44b1-8f3a-abc123
```

These URLs:

- never change  
- always resolve to the current location  
- survive renames, moves, and hosting migrations  

---

## **3.3. Redirect Behavior**

If a repository is renamed:

```
/yaml → /yaml2
```

UUID URL still works.

If moved to another user or organization:

```
/SkybuckFlying/yaml → /sky-org/yaml
```

UUID URL still works.

If moved to another hosting provider:

```
github → gitlab
```

UUID URL still works.

---

## **3.4. Fork and Mirror Detection**

With UUIDs, platforms can detect:

- mirrors (same UUID, same commit graph)  
- forks (different UUID, but `.gitorigin` references parent UUID)  
- duplicates (same UUID, different URLs)  

This enables:

- accurate fork lineage  
- upstream tracking  
- divergence analysis  
- provenance reconstruction  

---

## **3.5. Cloning by UUID**

Git could support:

```
git clone uuid:d8f1-9c2e-44b1-8f3a-abc123
```

Git resolves the UUID by:

- checking `.gitimports`  
- checking local registry  
- querying hosting providers  
- falling back to known mirrors  

---

## **3.6. Dependency Resolution Using UUIDs**

Manifests can reference UUIDs:

```
yaml = "uuid:d8f1-9c2e-44b1-8f3a-abc123"
```

Tools resolve the UUID to a URL using:

- `.gitimports`  
- hosting provider lookup  
- local cache  

This prevents dependency rot.

---

## **3.7. Security Metadata Integration**

Security advisories can reference UUIDs:

```
uuid = "d8f1-9c2e-44b1-8f3a-abc123"
cve = ["CVE-2022-28948"]
```

Platforms can warn users when:

- cloning vulnerable repos  
- checking out vulnerable commits  

---

## **3.8. Backward Compatibility**

- Repositories without `.git/identity` continue to work  
- Tools ignoring UUIDs continue to work  
- URLs remain unchanged  
- No protocol changes  
- No breaking changes  

---

# ⭐ **444. EXAMPLES — Why Stable Identity Matters**

## **Example 1: Go Import Path Breakage**

Today:

```
import "github.com/user1/yaml"
```

User renames account → imports break.

With UUIDs:

```
import "sky/yaml"
```

Mapped via:

```
sky/yaml = "uuid:d8f1-9c2e-44b1-8f3a-abc123"
```

Repo moves → nothing breaks.

---

## **Example 2: Fork Provenance Loss**

Today Git does not know:

- what repo a fork came from  
- when it was last synced  
- how far it diverged  

With UUIDs:

```
.gitorigin
origin_uuid = "d8f1-9c2e-44b1-8f3a-abc123"
last_synced = "a3f1234"
```

Git can track upstream properly.

---

## **Example 3: Security Advisory Stability**

Today advisories reference URLs:

```
CVE-2022-28948 affects github.com/user/yaml
```

Repo moves → advisory becomes ambiguous.

With UUIDs:

```
uuid = "d8f1-9c2e-44b1-8f3a-abc123"
cve = ["CVE-2022-28948"]
```

Advisory remains valid forever.

---

## **Example 4: Organizational Migration**

Company moves from GitHub → GitLab.

Today:

- all URLs change  
- manifests break  
- imports break  
- tooling breaks  

With UUIDs:

- identity stays the same  
- manifests stay valid  
- imports stay valid  
- tooling stays valid  

Only the URL mapping changes.

---

## **Example 5: Mirror Detection**

Today Git cannot tell:

- if two URLs point to the same repo  
- if a repo is a mirror  
- if a repo is a duplicate  

With UUIDs:

```
uuid = "d8f1-9c2e-44b1-8f3a-abc123"
```

Git instantly knows:

> “These two repos are the same project.”

---

Bye for now,  
  Skybuck Flying / Harald Houppermans ! ;) =D XD

^ permalink raw reply

* Re: [GSoC][PATCH 4/4] repo: add path.commondir with absolute and relative suffix formatting
From: Lucas Seiki Oshiro @ 2026-06-01 16:34 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: git, jltobler, gitster, phillip.wood, sandals, kumarayushjha123,
	a3205153416
In-Reply-To: <20260601151950.30686-5-jayatheerthkulkarni2005@gmail.com>


> diff --git a/t/t1900-repo-info.sh b/t/t1900-repo-info.sh
> index 7c7dfbb052..dd2706e1f7 100755
> --- a/t/t1900-repo-info.sh
> +++ b/t/t1900-repo-info.sh
> @@ -184,6 +184,7 @@ test_expect_success 'setup test repository layout for path fields' '
> mkdir -p test-repo/sub
> '
> 
> +test_repo_info_path 'commondir' '../.git'
> test_repo_info_path 'gitdir' '../.git'

I was thinking here, maybe you need to take a look at
git-rev-parse's tests and check what are the corner cases.

For example, `git rev-parse --git-common-dir` documentation
says:

    --git-common-dir:
        Show $GIT_COMMON_DIR if defined, else $GIT_DIR

This way, you should take a look on how git-rev-parse tests
test those two branches (GIT_COMMON_DIR and GIT_DIR).


^ permalink raw reply

* Re: [PATCH 2/2] builtin/init-db: deprecate alias for git-init(1)
From: Kristoffer Haugsbakk @ 2026-06-01 21:23 UTC (permalink / raw)
  To: Patrick Steinhardt, Phillip Wood; +Cc: git
In-Reply-To: <ah2VL-ftCQelNoOc@pks.im>

By the way I tried to find user mentions of git-init-db(1) on the
mailing list. (First git then I tried *all* but the results did not seem
dissimilar at all.) All I found was from last year[1] but the command
was used as a bug reproducer, which hints at some finger memory.

🔗 1: https://lore.kernel.org/git/d8c1df4e-a4d7-4c4c-be44-b13de3d9ffea@markus-raab.org/

On Mon, Jun 1, 2026, at 16:20, Patrick Steinhardt wrote:
> On Mon, Jun 01, 2026 at 02:48:05PM +0100, Phillip Wood wrote:
>>[snip
>>
>> Deprecating this command seems very sensible to me. As well as marking it
>> deprecated, do we want to print a warning when it is run? I imagine anyone
>> who has this command in their muscle memory is unlikely to be reading the
>> man page on a regular basis so wont see the warning there.
>
> I was wondering whether we want to call `you_still_use_that()` here.

As-is that will arguably promote the *breaking change* to right now
since it’s a `die(...)` function. That could be changed to be warn/die
modular of course.

But a simple warning message can just tell them to use git-init(1).

> I found it to be a bit heavy-handed as it's so trivial to replace with
> git-init(1), but on the other hand it's a trivial thing to do.

I imagine that most potential git-init-db(1) uses will be buried in some
scripts that haven’t been touched in years. Then the Git init might
fail, you get errors about git-commit(1) or something not being a thing
you can run without a repository, and it ends up being a headscratcher
since the original failure gets lost.

All to say I think a simple warning would be nice. ;)

^ permalink raw reply

* [PATCH v2] sub-process: use gentle handshake to avoid die() on startup failure
From: Michael Montalbo via GitGitGadget @ 2026-06-01 21:20 UTC (permalink / raw)
  To: git; +Cc: Michael Montalbo, Michael Montalbo
In-Reply-To: <pull.2133.git.1780287309846.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

When the configured subprocess command contains shell metacharacters
(such as a space), prepare_shell_cmd() wraps it in "sh -c <cmd>".
The shell itself always starts successfully, so start_command()
returns zero even if the tool inside does not exist.  The subsequent
handshake then reads from a dead pipe and calls die() via the
non-gentle packet_read_line(), killing the parent process instead of
letting it handle the error.

Before this change, a missing filter process at a path containing
spaces produces a confusing error:

    $ git -c filter.myfilter.process="/path with space/tool" \
          -c filter.myfilter.required=true add file.txt
    /path with space/tool: line 1: /path: No such file or directory
    fatal: the remote end hung up unexpectedly

After this change, the proper error is reported:

    $ git ... add file.txt
    /path with space/tool: line 1: /path: No such file or directory
    error: could not read greeting from subprocess '/path with space/tool'
    error: initialization for subprocess '/path with space/tool' failed
    fatal: file.txt: clean filter 'myfilter' failed

Switch the subprocess handshake from the dying packet_read_line()
to packet_read_line_gently() so that a process that exits during
startup produces an error return instead of killing the caller.

This affects any subprocess consumer whose command path contains
spaces.  On Windows this routinely happens because programs live
under "C:/Program Files/...", and MSYS2 path conversion can rewrite
absolute paths to include that prefix.  On POSIX it triggers
whenever the configured path naturally contains a space or other
metacharacter.  convert.c (filter.<driver>.process, used by git-lfs
and custom clean/smudge filters) is the primary affected consumer.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
    sub-process: use gentle handshake to avoid die() on startup failure

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2133%2Fmmontalbo%2Fmm%2Fsubprocess-handshake-fix-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2133/mmontalbo/mm/subprocess-handshake-fix-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2133

Range-diff vs v1:

 1:  c11b01c156 ! 1:  cb7bd777dc sub-process: use gentle handshake to avoid die() on startup failure
     @@ Commit message
          Before this change, a missing filter process at a path containing
          spaces produces a confusing error:
      
     -        $ git -c filter.lfs.process="/path with space/tool" \
     -              -c filter.lfs.required=true add file.txt
     +        $ git -c filter.myfilter.process="/path with space/tool" \
     +              -c filter.myfilter.required=true add file.txt
     +        /path with space/tool: line 1: /path: No such file or directory
              fatal: the remote end hung up unexpectedly
      
          After this change, the proper error is reported:
      
              $ git ... add file.txt
     +        /path with space/tool: line 1: /path: No such file or directory
     +        error: could not read greeting from subprocess '/path with space/tool'
              error: initialization for subprocess '/path with space/tool' failed
     -        fatal: file.txt: clean filter 'lfs' failed
     +        fatal: file.txt: clean filter 'myfilter' failed
      
          Switch the subprocess handshake from the dying packet_read_line()
          to packet_read_line_gently() so that a process that exits during
     @@ sub-process.c: static int handshake_version(struct child_process *process,
       		return error("Could not write flush packet");
       
      -	if (!(line = packet_read_line(process->out, NULL)) ||
     -+	if (packet_read_line_gently(process->out, NULL, &line) <= 0 ||
     - 	    !skip_prefix(line, welcome_prefix, &p) ||
     +-	    !skip_prefix(line, welcome_prefix, &p) ||
     ++	if (packet_read_line_gently(process->out, NULL, &line) < 0)
     ++		return error("could not read greeting from subprocess '%s'",
     ++			     process->args.v[0]);
     ++	if (!line || !skip_prefix(line, welcome_prefix, &p) ||
       	    strcmp(p, "-server"))
       		return error("Unexpected line '%s', expected %s-server",
       			     line ? line : "<flush packet>", welcome_prefix);
      -	if (!(line = packet_read_line(process->out, NULL)) ||
     -+	if (packet_read_line_gently(process->out, NULL, &line) <= 0 ||
     - 	    !skip_prefix(line, "version=", &p) ||
     +-	    !skip_prefix(line, "version=", &p) ||
     ++	if (packet_read_line_gently(process->out, NULL, &line) < 0)
     ++		return error("could not read version from subprocess '%s'",
     ++			     process->args.v[0]);
     ++	if (!line || !skip_prefix(line, "version=", &p) ||
       	    strtol_i(p, 10, chosen_version))
       		return error("Unexpected line '%s', expected version",
       			     line ? line : "<flush packet>");
      -	if ((line = packet_read_line(process->out, NULL)))
     --		return error("Unexpected line '%s', expected flush", line);
     -+	if (packet_read_line_gently(process->out, NULL, &line) < 0 || line)
     -+		return error("Unexpected line '%s', expected flush",
     -+			     line ? line : "<read error>");
     ++	if (packet_read_line_gently(process->out, NULL, &line) < 0)
     ++		return error("could not read version flush from subprocess '%s'",
     ++			     process->args.v[0]);
     ++	if (line)
     + 		return error("Unexpected line '%s', expected flush", line);
       
       	/* Check to make sure that the version received is supported */
     - 	for (i = 0; versions[i]; i++) {
      @@ sub-process.c: static int handshake_capabilities(struct child_process *process,
       	if (packet_flush_gently(process->in))
       		return error("Could not write flush packet");
       
      -	while ((line = packet_read_line(process->out, NULL))) {
     -+	while (packet_read_line_gently(process->out, NULL, &line) > 0) {
     ++	for (;;) {
       		const char *p;
     ++		int len = packet_read_line_gently(process->out, NULL, &line);
     ++
     ++		if (len < 0)
     ++			return error("could not read capabilities from subprocess '%s'",
     ++				     process->args.v[0]);
     ++		if (!line)
     ++			break;
       		if (!skip_prefix(line, "capability=", &p))
       			continue;
     + 
      
       ## t/t0021-conversion.sh ##
      @@ t/t0021-conversion.sh: test_expect_success 'invalid process filter must fail (and not hang!)' '


 sub-process.c         | 26 ++++++++++++++++++++------
 t/t0021-conversion.sh | 17 +++++++++++++++++
 2 files changed, 37 insertions(+), 6 deletions(-)

diff --git a/sub-process.c b/sub-process.c
index 83bf0a0e82..2d5c965169 100644
--- a/sub-process.c
+++ b/sub-process.c
@@ -132,17 +132,24 @@ static int handshake_version(struct child_process *process,
 	if (packet_flush_gently(process->in))
 		return error("Could not write flush packet");
 
-	if (!(line = packet_read_line(process->out, NULL)) ||
-	    !skip_prefix(line, welcome_prefix, &p) ||
+	if (packet_read_line_gently(process->out, NULL, &line) < 0)
+		return error("could not read greeting from subprocess '%s'",
+			     process->args.v[0]);
+	if (!line || !skip_prefix(line, welcome_prefix, &p) ||
 	    strcmp(p, "-server"))
 		return error("Unexpected line '%s', expected %s-server",
 			     line ? line : "<flush packet>", welcome_prefix);
-	if (!(line = packet_read_line(process->out, NULL)) ||
-	    !skip_prefix(line, "version=", &p) ||
+	if (packet_read_line_gently(process->out, NULL, &line) < 0)
+		return error("could not read version from subprocess '%s'",
+			     process->args.v[0]);
+	if (!line || !skip_prefix(line, "version=", &p) ||
 	    strtol_i(p, 10, chosen_version))
 		return error("Unexpected line '%s', expected version",
 			     line ? line : "<flush packet>");
-	if ((line = packet_read_line(process->out, NULL)))
+	if (packet_read_line_gently(process->out, NULL, &line) < 0)
+		return error("could not read version flush from subprocess '%s'",
+			     process->args.v[0]);
+	if (line)
 		return error("Unexpected line '%s', expected flush", line);
 
 	/* Check to make sure that the version received is supported */
@@ -171,8 +178,15 @@ static int handshake_capabilities(struct child_process *process,
 	if (packet_flush_gently(process->in))
 		return error("Could not write flush packet");
 
-	while ((line = packet_read_line(process->out, NULL))) {
+	for (;;) {
 		const char *p;
+		int len = packet_read_line_gently(process->out, NULL, &line);
+
+		if (len < 0)
+			return error("could not read capabilities from subprocess '%s'",
+				     process->args.v[0]);
+		if (!line)
+			break;
 		if (!skip_prefix(line, "capability=", &p))
 			continue;
 
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index f0d50d769e..033b00a364 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -857,6 +857,23 @@ test_expect_success 'invalid process filter must fail (and not hang!)' '
 	)
 '
 
+test_expect_success 'missing process filter with space in path does not die' '
+	test_config_global filter.protocol.process "/non existent/tool" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp "$TEST_ROOT/test.o" test.r &&
+		test_must_fail git add . 2>git-stderr.log &&
+		test_grep "clean filter.*protocol.*failed" git-stderr.log
+	)
+'
+
 test_expect_success 'delayed checkout in process filter' '
 	test_config_global filter.a.process "test-tool rot13-filter --log=a.log clean smudge delay" &&
 	test_config_global filter.a.required true &&

base-commit: 29bd7ed5127255713c1ac2f43b7c6f257d7b4594
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH v2] doc: fix typos via codespell
From: Kristoffer Haugsbakk @ 2026-06-01 20:59 UTC (permalink / raw)
  To: Junio C Hamano, Andrew Kreimer; +Cc: git
In-Reply-To: <xmqqo6hv9i1w.fsf@gitster.g>

On Mon, Jun 1, 2026, at 03:16, Junio C Hamano wrote:
>[snip]
>
> However, there are things that BREAK tests.
>
>> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
>> index ac4a5b2734..869fb4a14e 100755
>> --- a/t/t1700-split-index.sh
>> +++ b/t/t1700-split-index.sh
>> @@ -502,7 +502,7 @@ test_expect_success 'do not refresh null base index' '
>>  		git checkout main &&
>>  		git update-index --split-index &&
>>  		test_commit more &&
>> -		# must not write a new shareindex, or we wont catch the problem
>> +		# must not write a new shareindex, or we won't catch the problem
>>  		git -c splitIndex.maxPercentChange=100 merge --no-edit side-branch 2>err &&
>>  		# i.e. do not expect warnings like
>>  		# could not freshen shared index .../shareindex.00000...
>
> The edit above is made to a STRING that is enclosed inside a pair of
> single quote.  If we want to use "won't", we would need to write "We
> won'\''t", but while it may be syntactically correct as a part of
> shell script, it is a pointless change, as the target audience wants
> to see this line as if it is just a plain text.

Sorry about not testing this on v1. “Surely this does not affect the
code...” strikes again.

>
> "We will not" would be acceptable,
>
>> diff --git a/t/t3909-stash-pathspec-file.sh b/t/t3909-stash-pathspec-file.sh
>>[snip]

^ permalink raw reply

* Missing Git Features for Modern Multi-Repository, Dependency-Driven Development
From: Skybuck Flying @ 2026-06-01 20:57 UTC (permalink / raw)
  To: Git

Modern software projects increasingly rely on large dependency graphs, multi-repository structures, 
reproducible builds, and long-term provenance. Git provides excellent version control but lacks native mechanisms 
for these workflows. 
This RFC outlines optional, backward-compatible metadata extensions that would allow Git to better support modern development practices.

This document contains three sections:

111. WHAT GIT SHOULD HAVE BEEN
222. RFC/PROPOSAL SECTION
333. EXAMPLE SECTION (Examples of why it would be usefull)

***
111. WHAT GIT SHOULD HAVE BEEN
***

---

# ⭐ **REPORT: Missing Git Features Required for a Complete Modern System**

Git is brilliant at what it *was designed for*:

- content-addressable storage  
- distributed history  
- immutable snapshots  

But Git is **NOT** a complete software-engineering system.

Below is the list of features Git *should* have had to avoid the chaos of:

- meaningless import paths  
- dependency hell  
- submodule drift  
- missing provenance  
- missing security metadata  
- missing version semantics  
- missing reproducibility  
- missing dependency manifests  

These are the features Git is missing.

---

# ⭐ 1. **A Built-In Dependency Manifest System**
Git should have had:

### ✔ A first-class manifest file  
Something like:

```
.gitdeps
```

Containing:

- dependency name  
- dependency URL  
- dependency version  
- dependency commit  
- dependency purpose  
- dependency license  
- dependency security metadata  

### ✔ A built-in lockfile  
Something like:

```
.gitdeps.lock
```

Freezing:

- exact commit hashes  
- exact versions  
- exact URLs  

### ✔ Automatic dependency resolution  
Like Cargo, Go, npm, Maven, Gradle.

### ✔ Automatic dependency graph visualization  
Not “git submodule status”.

### ✔ Automatic reproducible builds  
Not “hope the submodules are correct”.

---

# ⭐ 2. **Semantic Versioning Support**
Git should have supported:

### ✔ Version numbers  
Not just tags.

### ✔ Version ranges  
Not just commit hashes.

### ✔ Version constraints  
Not just “latest tag”.

### ✔ Version compatibility rules  
Not just “merge and pray”.

### ✔ Version negotiation  
Not just “clone and hope”.

---

# ⭐ 3. **Built-In Provenance Tracking**
Git should have stored:

### ✔ Original upstream URL  
### ✔ Fork lineage  
### ✔ Migration history  
### ✔ Renames  
### ✔ Moves  
### ✔ Repository identity  

This should have been **inside the repo metadata**, not:

- lost  
- local only  
- dependent on remotes  
- dependent on GitHub  
- dependent on user discipline  

You should NEVER lose:

- where a repo came from  
- who forked it  
- why it exists  
- what it was based on  

Git does not store this.  
It should.

---

# ⭐ 4. **Built-In Security Metadata**
Git should have had:

### ✔ CVE metadata per commit  
### ✔ Security advisories per tag  
### ✔ Vulnerability scanning  
### ✔ Security provenance  
### ✔ Signed dependency manifests  
### ✔ Automatic alerts when upstream is compromised  

Instead, we have:

- nothing  
- external tools  
- Go’s vuln system (only for Go modules)  
- GitHub advisories (platform-specific)  

Git should have had this **natively**.

---

# ⭐ 5. **Built-In Fork Synchronization**
Git should have supported:

### ✔ Automatic upstream tracking  
### ✔ Automatic upstream diffing  
### ✔ Automatic upstream merge suggestions  
### ✔ Automatic conflict detection  
### ✔ Automatic patch propagation  

Instead, we have:

- manual remotes  
- manual fetch  
- manual merge  
- manual conflict resolution  

Git should have had a **first-class fork model**.

---

# ⭐ 6. **Built-In Repository Renaming Without Breaking Imports**
Git should have supported:

### ✔ Stable repository identity  
### ✔ Stable import identity  
### ✔ Stable module identity  
### ✔ Renames without breakage  
### ✔ Moves without breakage  
### ✔ Aliases  

Instead, Go import paths break if:

- the repo moves  
- the user changes their username  
- the repo is renamed  
- the platform changes  
- the domain changes  

Git should have had **stable identity**, not “URL = identity”.

---

# ⭐ 7. **Built-In Multi-Repo Project Support**
Git should have supported:

### ✔ Multi-repo projects  
### ✔ Multi-repo manifests  
### ✔ Multi-repo versioning  
### ✔ Multi-repo snapshots  
### ✔ Multi-repo reproducibility  

Instead, we have:

- submodules (broken)  
- subtrees (hacky)  
- monorepos (workaround)  
- external tools (Bazel, Buck, Pants)  

Git should have had **native multi-repo support**.

---

# ⭐ 8. **Built-In Metadata Files for Remote Code**
Git should have supported:

### ✔ `.gitorigin`  
Stores original upstream URL.

### ✔ `.gitpurpose`  
Stores why the repo exists.

### ✔ `.gitfork`  
Stores fork lineage.

### ✔ `.gitsecurity`  
Stores security metadata.

### ✔ `.gitdeps`  
Stores dependency graph.

Instead, we have:

- nothing  
- manual text files  
- tribal knowledge  

Git should have had **first-class metadata**.

---

# ⭐ 9. **Built-In Import Path Abstraction**
Git should have supported:

### ✔ Clean import names  
### ✔ Semantic import names  
### ✔ Import aliases  
### ✔ Import remapping  
### ✔ Import rewriting  

Without breaking:

- security scanning  
- module identity  
- provenance  

Instead, languages like Go embed garbage URLs into source code.

Git should have provided a **clean abstraction layer**.

---

# ⭐ 10. **Built-In Reproducible Snapshots**
Git should have supported:

### ✔ Project-level snapshots  
### ✔ Dependency snapshots  
### ✔ Multi-repo snapshots  
### ✔ Build snapshots  
### ✔ Environment snapshots  

Instead, reproducibility is:

- manual  
- fragile  
- external  
- inconsistent  

Git should have had **snapshot manifests**.

---

# ⭐ FINAL SUMMARY — WHAT GIT SHOULD HAVE BEEN

Git should have included:

1. **Dependency manifest system**  
2. **Lockfile system**  
3. **Semantic versioning**  
4. **Provenance tracking**  
5. **Security metadata**  
6. **Fork synchronization**  
7. **Stable repository identity**  
8. **Multi-repo project support**  
9. **Import path abstraction**  
10. **Reproducible snapshots**

If Git had these features, you would NOT be suffering:

- meaningless import paths  
- dependency chaos  
- submodule hell  
- lost provenance  
- broken security scanning  
- non-future-proof naming  
- manual patch tracking  
- manual manifest creation  
- manual Delphi porting  

Git is brilliant — but incomplete.

***
222 RFC/PROPOSAL SECTION:
***

---

# **RFC: Proposal for Enhancing Git with First-Class Dependency, Provenance, and Security Metadata**

---

## **1. Introduction**

Git is exceptionally strong as a distributed version-control system, but modern software development increasingly relies on:

- multi-repository architectures  
- dependency graphs  
- reproducible builds  
- long-term provenance  
- fork synchronization  
- security metadata  
- stable module identities  

These requirements are now fundamental to large-scale software engineering, yet Git provides no native mechanisms for them. As a result, ecosystems (Go, Rust, npm, Cargo, Maven, etc.) have built their own parallel systems on top of Git to compensate for missing features.

This RFC proposes a set of enhancements that would allow Git to natively support these modern workflows.

---

## **2. Problem Statement**

Git repositories today lack:

1. **Dependency manifests**  
2. **Dependency lockfiles**  
3. **Provenance metadata**  
4. **Fork lineage tracking**  
5. **Stable repository identity independent of URL**  
6. **Security advisory integration**  
7. **Multi-repository project support**  
8. **Import path abstraction**  
9. **Reproducible multi-repo snapshots**

These gaps force developers to rely on:

- ad-hoc conventions  
- external package managers  
- fragile submodules  
- undocumented local remotes  
- manual patch tracking  
- URL-encoded identities  
- platform-specific metadata (GitHub/GitLab)  

This creates long-term maintainability issues, especially when:

- repositories are renamed  
- maintainers disappear  
- URLs change  
- forks diverge  
- security advisories are issued  
- dependency graphs grow large  

Git’s current feature set is insufficient for these realities.

---

## **3. Proposed Features**

### **3.1. First-Class Dependency Manifest**

Introduce a repository-level file:

```
.gitdeps
```

Containing:

- dependency name  
- dependency URL  
- dependency version or commit  
- purpose/description  
- license metadata  

This would be analogous to:

- go.mod  
- Cargo.toml  
- package.json  
- Maven pom.xml  

but standardized at the Git level.

---

### **3.2. Dependency Lockfile**

Introduce:

```
.gitdeps.lock
```

Containing:

- exact commit hashes  
- integrity hashes  
- reproducible snapshot metadata  

This enables deterministic builds across machines and time.

---

### **3.3. Provenance Metadata**

Introduce:

```
.gitorigin
```

Containing:

- original upstream URL  
- fork lineage  
- migration history  
- repository identity (stable UUID)  

This prevents loss of provenance when:

- remotes are removed  
- repositories are renamed  
- repositories move between hosts  

---

### **3.4. Stable Repository Identity**

Introduce a **repository UUID** stored in `.git/identity`.

This would allow:

- renaming  
- moving  
- mirroring  
- hosting changes  

without breaking:

- import paths  
- dependency manifests  
- security metadata  
- tooling  

This solves the long-standing problem of “URL = identity”.

---

### **3.5. Security Metadata Integration**

Introduce:

```
.gitsecurity
```

Containing:

- CVE metadata  
- advisory links  
- affected versions  
- patched versions  
- severity  

This allows Git to:

- warn on checkout  
- warn on merge  
- warn on dependency resolution  

without relying on external platforms.

---

### **3.6. Fork Synchronization Metadata**

Introduce:

```
.gitfork
```

Containing:

- upstream URL  
- last synced commit  
- divergence metadata  
- pending upstream changes  

This enables:

- automated fork synchronization  
- upstream diffing  
- patch propagation  

---

### **3.7. Multi-Repository Project Support**

Introduce:

```
.gitproject
```

Containing:

- list of repositories  
- versions/commits  
- dependency graph  
- snapshot ID  

This replaces:

- submodules  
- subtrees  
- monorepo hacks  
- external build systems  

with a native Git solution.

---

### **3.8. Import Path Abstraction Layer**

Introduce:

```
.gitimports
```

Mapping:

- clean semantic names → repository identities  
- repository identities → URLs  

This allows:

- renaming repositories  
- moving repositories  
- reorganizing namespaces  

without breaking source code.

---

### **3.9. Reproducible Multi-Repo Snapshots**

Introduce:

```
.gitproject.lock
```

Containing:

- exact commits for all repos  
- integrity hashes  
- dependency graph hash  

This enables:

- reproducible builds  
- reproducible CI  
- reproducible releases  

across multi-repo systems.

---

## **4. Backward Compatibility**

All proposed files:

- are optional  
- do not affect existing Git behavior  
- do not break existing repositories  
- can be ignored by older Git versions  
- can be adopted incrementally  

This ensures safe adoption.

---

## **5. Benefits**

### **For developers**
- stable naming  
- reproducible builds  
- long-term maintainability  
- clear provenance  
- easier forking  
- easier patch tracking  

### **For large organizations**
- multi-repo project management  
- compliance and auditability  
- security integration  
- deterministic builds  

### **For ecosystems**
- no need to reinvent dependency systems  
- no need to encode URLs into source code  
- no need for fragile submodules  

---

## **6. Conclusion**

Git is an exceptional version-control system, but modern software development requires features that Git does not currently provide. This RFC proposes a set of optional, backward-compatible extensions that would allow Git to evolve into a complete, future-proof foundation for multi-repository, dependency-driven development.

I welcome discussion, critique, and refinement of these ideas.

**— Harald Houppermans**

---

If you want, I can also prepare:

- a shorter version  
- a more formal academic-style version  
- a version targeted at GitHub/GitLab instead of Git itself  
- a version with diagrams and examples

***
333 EXAMPLE SECTION
***

---

# **RFC (with Examples): Enhancing Git with First-Class Dependency, Provenance, and Security Metadata**

**From:** Harald Houppermans  
**Subject:** RFC (with Examples): Missing Git Features for Modern Multi-Repository Development  
**To:** git@vger.kernel.org  
**Date:** (fill in)

---

## **1. Introduction**

Modern software development relies heavily on:

- multi-repository dependency graphs  
- reproducible builds  
- long-term provenance  
- fork synchronization  
- security advisories  
- stable module identities  

Git provides none of these natively.  
This RFC illustrates the missing features using **real examples** from common workflows.

---

# **2. Problems Illustrated with Real Examples**

## **2.1. Missing Dependency Manifest**

### **Example Problem**
A project depends on 40+ upstream repositories:

```
github.com/go-yaml/yaml
github.com/pelletier/go-toml
golang.org/x/crypto
github.com/stretchr/testify
```

Git has **no way** to record:

- why these dependencies exist  
- which versions are required  
- which commits were used  
- how they relate to each other  

Developers must rely on:

- ad-hoc documentation  
- external package managers  
- fragile submodules  

### **Proposed Solution**
Introduce:

```
.gitdeps
```

Example:

```
[yaml]
url = "https://github.com/go-yaml/yaml"
commit = "a3f1234"
purpose = "YAML parsing"

[toml]
url = "https://github.com/pelletier/go-toml"
commit = "b7c9812"
purpose = "TOML configuration"
```

---

## **2.2. Missing Lockfile for Reproducible Builds**

### **Example Problem**
Two developers clone the same project.  
One gets dependency commit A, the other gets commit B.

Builds differ.  
Bugs differ.  
Security exposure differs.

### **Proposed Solution**
Introduce:

```
.gitdeps.lock
```

Example:

```
yaml = "a3f1234"
toml = "b7c9812"
crypto = "c9d8123"
```

This ensures **deterministic builds**.

---

## **2.3. Missing Provenance Metadata**

### **Example Problem**
A forked repository loses its upstream information:

```
git remote add upstream ...
```

This is **local only**.  
Once pushed to GitHub/GitLab, provenance is lost.

### **Proposed Solution**
Introduce:

```
.gitorigin
```

Example:

```
origin = "https://github.com/go-yaml/yaml"
forked_by = "Skybuck"
reason = "Long-term maintenance + reproducibility"
```

This metadata travels with the repository.

---

## **2.4. Missing Stable Repository Identity**

### **Example Problem**
If a repository is renamed:

```
github.com/user1/yaml → github.com/user2/yaml
```

All import paths break.  
All tooling breaks.  
All downstream forks break.

Git treats the URL as the identity.

### **Proposed Solution**
Introduce a stable repository UUID:

```
.git/identity
uuid = "d8f1-9c2e-44b1-8f3a-abc123"
```

URLs can change; identity remains stable.

---

## **2.5. Missing Security Metadata**

### **Example Problem**
A dependency has a CVE:

```
CVE-2022-28948 in go-yaml/yaml
```

Git has no way to:

- warn on checkout  
- warn on merge  
- warn on dependency resolution  

### **Proposed Solution**
Introduce:

```
.gitsecurity
```

Example:

```
[yaml]
cve = ["CVE-2022-28948"]
fixed_in = "v3.0.1"
severity = "high"
```

Git could warn:

> “Warning: dependency yaml@a3f1234 contains known vulnerabilities.”

---

## **2.6. Missing Fork Synchronization Metadata**

### **Example Problem**
A fork diverges from upstream.  
There is no built-in way to track:

- last upstream sync  
- pending upstream commits  
- divergence depth  

### **Proposed Solution**
Introduce:

```
.gitfork
```

Example:

```
upstream = "https://github.com/go-yaml/yaml"
last_synced = "a3f1234"
pending_commits = 12
```

---

## **2.7. Missing Multi-Repository Project Support**

### **Example Problem**
A project consists of 20 repositories.  
Git submodules are:

- fragile  
- drift-prone  
- hard to clone  
- hard to update  
- hard to audit  

### **Proposed Solution**
Introduce:

```
.gitproject
```

Example:

```
repos = [
  "core",
  "parser",
  "crypto",
  "network",
  "ui"
]
```

And a lockfile:

```
.gitproject.lock
core = "a1b2c3"
parser = "d4e5f6"
crypto = "112233"
```

This enables **reproducible multi-repo snapshots**.

---

## **2.8. Missing Import Path Abstraction**

### **Example Problem**
Languages like Go embed URLs directly into source code:

```
import "github.com/go-yaml/yaml"
```

If the repo moves or is renamed, the code breaks.

### **Proposed Solution**
Introduce:

```
.gitimports
```

Example:

```
sky/yaml = "uuid:d8f1-9c2e-44b1-8f3a-abc123"
```

Source code imports:

```
import "sky/yaml"
```

Git resolves it via the identity mapping.

---

# **3. Summary of Proposed Files**

| File | Purpose |
|------|---------|
| `.gitdeps` | Dependency manifest |
| `.gitdeps.lock` | Reproducible dependency snapshot |
| `.gitorigin` | Provenance metadata |
| `.gitsecurity` | Security advisories |
| `.gitfork` | Fork synchronization metadata |
| `.gitproject` | Multi-repo project definition |
| `.gitproject.lock` | Multi-repo snapshot |
| `.gitimports` | Import path abstraction |

All files are:

- optional  
- backward-compatible  
- non-breaking  
- easy to adopt incrementally  

---

# **4. Conclusion**

These examples demonstrate that Git lacks several features required for modern multi-repository, dependency-driven development. The proposed metadata files and identity mechanisms would significantly improve:

- reproducibility  
- provenance  
- security  
- maintainability  
- long-term stability  

I welcome discussion and refinement.

**— Harald Houppermans**

Bye for now,
  Skybuck Flying/Harald Houppermans ! ;) =D XD 

^ permalink raw reply

* Re: [PATCH v1 3/4] environment: move 'trust_executable_bit' into repo_config_values
From: Tian Yuchen @ 2026-06-01 18:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, christian.couder, ps, Ayush Chandekar, Olamide Caleb Bello
In-Reply-To: <e0d5b1af-b040-49e2-90f9-d8325682826b@malon.dev>

On 6/1/26 18:10, Tian Yuchen wrote:

> That’s true: I had actually planned to start migrating has_symlinks as 
> soon as this series was approved. Since you think it would be better to 
> merge them into a single series, I’ll go ahead and do that ;)
> 

I’ve found that migrating has_symlinks seems to be quite a tricky 
business. Some callers in certain files pass very few parameters, and 
the call stack is quite deep, if I am correct. so I feel that adding a 
repo for this purpose might be overkill. Perhaps it would be better to 
focus on trust_executable_bit for now?

Regards, yuchen

^ permalink raw reply

* Re: [GSoC][PATCH 3/4] repo: add path.gitdir with absolute and relative suffix formatting
From: Lucas Seiki Oshiro @ 2026-06-01 16:28 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: git, jltobler, gitster, phillip.wood, sandals, kumarayushjha123,
	a3205153416
In-Reply-To: <20260601151950.30686-4-jayatheerthkulkarni2005@gmail.com>


> +test_repo_info_path () {
> + field_name=$1
> + expect_relative=$2
> +
> + test_expect_success "query individual key: path.$field_name.absolute" '
> + (
> + cd test-repo/sub &&
> + expect_absolute=$(cd .. && pwd)/.git &&

Note that this semi-hardcoded path won't work for other values (e.g.
top level dir, superproject working tree). This needs to be a parameter
just like `expect_relative`

^ permalink raw reply

* Re: [GSoC][PATCH 0/4] teach git repo info to handle path keys
From: Lucas Seiki Oshiro @ 2026-06-01 16:25 UTC (permalink / raw)
  To: K Jayatheerth
  Cc: git, jltobler, gitster, phillip.wood, sandals, kumarayushjha123,
	a3205153416
In-Reply-To: <20260601151950.30686-1-jayatheerthkulkarni2005@gmail.com>


> 1. Should there still be a --path-format flag?

If you specify "absolute" and "relative" in the keys, it won't
make sense to use it.

> 2. Should we consider a default option?

Some pros and cons:

- Pro: some values make more sense to be in absolute or relative
  format
- Pro: it's boring to always add `.(relative|absolute)` to the
  paths
- Con: it will be perpetuating what git-rev-parse does, and we
  don't git-repo-info to be git-rev-parse with a different
  interface. It's our chance to learn with [1] for example.
- Con: the user will need if the value is relative or absolute

> 3. Is printing both absolute and relative in a single call
>   using --all acceptable?

If you're providing both keys, I think it's not only acceptable
but mandatory. `--all` should mean "all", not "all, but ...".

> I have discussed these changes with both Justin and Lucas
> internally. This series is presented to gather opinions from the
> wider community before moving forward.

I probably sent the same comments internally, but I'm sending
here to share my opinions with the rest of the community ;-)

[1] fac60b8925 (rev-parse: add option for absolute or relative path formatting, 2020-12-13)



^ permalink raw reply

* [PATCH v2] index-pack: retain child bases in delta cache
From: Arijit Banerjee via GitGitGadget @ 2026-06-01 16:13 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Derrick Stolee, Arijit Banerjee, Arijit Banerjee
In-Reply-To: <pull.2131.git.1780070763044.gitgitgadget@gmail.com>

From: Arijit Banerjee <arijit@effectiveailabs.com>

When resolving a delta whose result has children of its own,
index-pack adds the result to work_head, accounts its data in
base_cache_used, and calls prune_base_data(). It then immediately frees
that same data.

This bypasses the existing delta base cache policy and can force later
descendants to reconstruct the queued base again. Let the existing
delta_base_cache_limit pruning policy decide whether to keep or evict
the data instead.

This does not add a new cache or increase the cache limit. The object
data is already accounted in base_cache_used before prune_base_data()
runs, and the existing pruning and base cleanup paths still release it.

On a quiet Ubuntu 24.04 VM with 16 vCPUs, 32 GiB RAM, and local SSD,
direct index-pack timings on single-pack Linux fixtures improved as
follows:

  linux blobless: 69.17s -> 57.98s (16.2% faster), RSS flat
  linux full:     280.72s -> 236.32s (15.8% faster), RSS +1.9%

Five-repeat medians on public repositories also improved:

  git.git:  12.31s -> 10.70s (13.1% faster)
  libgit2:   3.35s ->  2.88s (14.0% faster)
  redis:     6.52s ->  5.64s (13.5% faster)
  cpython:  33.02s -> 31.44s (4.8% faster)

The standard p5302 perf test on a smaller git.git fixture was neutral:

  5302.9 index-pack default threads:
    11.21(38.07+1.33) -> 11.16(37.90+1.31), -0.4%

t/t5302-pack-index.sh passed, and GitGitGadget's linux-leaks CI also
exercised that test under SANITIZE=leak.

Signed-off-by: Arijit Banerjee <arijit@effectiveailabs.com>
---
    index-pack: retain child bases in delta cache
    
    Speed up the local pack indexing phase of clone/fetch for large
    delta-compressed packs by keeping reconstructed delta bases available
    for reuse when they are queued for later delta resolution.
    
    When index-pack reconstructs a child base and queues it for resolving
    descendant deltas, it currently frees that data immediately. This can
    force the same base to be reconstructed again. Instead, keep it in the
    existing delta base cache and let the existing delta_base_cache_limit
    policy decide whether to retain or evict it.
    
    This does not add a new cache or increase the cache limit. The object
    data is already accounted in base_cache_used, and prune_base_data() is
    already called at this point.
    
    Correctness:
    
     * t/t5302-pack-index.sh passed all 36 tests.
    
    Benchmarks on a quiet Ubuntu 24.04 VM, 16 vCPU, 32 GiB RAM, local SSD:
    
    pack baseline patched wall-time change RSS change linux blobless 69.17s
    57.98s 16.2% faster -0.0% linux full 280.72s 236.32s 15.8% faster +1.9%
    
    Five-repeat public-repo medians also improved: git.git 13.1%, libgit2
    14.0%, redis 13.5%, cpython 4.8%.
    
    Perf on the linux blobless pack showed the same direction under
    profiling: 76.64s baseline vs 61.09s patched, with similar RSS.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2131%2Farijit91%2Findex-pack-retain-child-base-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2131/arijit91/index-pack-retain-child-base-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2131

Range-diff vs v1:

 1:  cafb1700be ! 1:  42eca38f51 index-pack: retain child bases in delta cache
     @@ Commit message
      
          When resolving a delta whose result has children of its own,
          index-pack adds the result to work_head, accounts its data in
     -    base_cache_used, and calls prune_base_data(). It then immediately
     -    frees that same data.
     +    base_cache_used, and calls prune_base_data(). It then immediately frees
     +    that same data.
      
          This bypasses the existing delta base cache policy and can force later
          descendants to reconstruct the queued base again. Let the existing
          delta_base_cache_limit pruning policy decide whether to keep or evict
          the data instead.
      
     +    This does not add a new cache or increase the cache limit. The object
     +    data is already accounted in base_cache_used before prune_base_data()
     +    runs, and the existing pruning and base cleanup paths still release it.
     +
     +    On a quiet Ubuntu 24.04 VM with 16 vCPUs, 32 GiB RAM, and local SSD,
     +    direct index-pack timings on single-pack Linux fixtures improved as
     +    follows:
     +
     +      linux blobless: 69.17s -> 57.98s (16.2% faster), RSS flat
     +      linux full:     280.72s -> 236.32s (15.8% faster), RSS +1.9%
     +
     +    Five-repeat medians on public repositories also improved:
     +
     +      git.git:  12.31s -> 10.70s (13.1% faster)
     +      libgit2:   3.35s ->  2.88s (14.0% faster)
     +      redis:     6.52s ->  5.64s (13.5% faster)
     +      cpython:  33.02s -> 31.44s (4.8% faster)
     +
     +    The standard p5302 perf test on a smaller git.git fixture was neutral:
     +
     +      5302.9 index-pack default threads:
     +        11.21(38.07+1.33) -> 11.16(37.90+1.31), -0.4%
     +
     +    t/t5302-pack-index.sh passed, and GitGitGadget's linux-leaks CI also
     +    exercised that test under SANITIZE=leak.
     +
          Signed-off-by: Arijit Banerjee <arijit@effectiveailabs.com>
      
       ## builtin/index-pack.c ##


 builtin/index-pack.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index cf0bd8280d..027c64b522 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1212,7 +1212,6 @@ static void *threaded_second_pass(void *data)
 			list_add(&child->list, &work_head);
 			base_cache_used += child->size;
 			prune_base_data(NULL);
-			free_base_data(child);
 		} else if (child) {
 			/*
 			 * This child does not have its own children. It may be

base-commit: c69baaf57ba26cf117c2b6793802877f19738b0d
-- 
gitgitgadget

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox