Notes from the Git Contributor's Summit, 2025

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Notes from the Git Contributor's Summit, 2025
@ 2025-10-06 19:16 Taylor Blau
  2025-10-06 19:18 ` [NOTES 01/11] SHA-256 and interoperability work Taylor Blau
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Taylor Blau @ 2025-10-06 19:16 UTC (permalink / raw)
  To: git

It was great to see folks both virtually and in person last week at the
Contributor's Summit!

After travelling back home and catching up on things that I missed while
at Git Merge, I had a chance to polish up the notes we took during the
Contributor's Summit to share with the list.

The notes are available (as read-only) in Google Docs, too, for folks
who prefer to view them there are the following link:

    https://docs.google.com/document/d/1arvvXP8DrF3F8PCKQOmGvYh5jUg8P9Clx9m-FgDD4EI

At the Contributor's Summit, we discussed the following topics (with
topic leaders in parentheses):

 - SHA-256 and interoperability work (brian m. carlson)
 - First-class conflicts in Git? (Martin von Zweigbergk)
 - The future of history rewriting - rebase, replay and history
   (+Change-IDs) (Phillip Wood / Scott Chacon)
 - Rust (Patrick Steinhardt)
 - Pluggable object databases (Patrick Steinhardt)
 - Repository maintenance long-term goals (Taylor Blau)
 - Change-ID Header in Git (Philip Metzger)
 - Resumable fetch / push (Scott Chacon)
 - Git 3.0 (Patrick Steinhardt)
 - How can companies respectfully engage contractors to work on Git?
   (Emily Shaffer)
 - Conservancy 2025 updates (Taylor Blau)

The list of all topics proposed (and the number of votes they received)
are here:

    https://docs.google.com/spreadsheets/d/1mSyAvvpYTIuR7JIm7J0H0IQQEUt5y8zq8Bo53nl8yCY

I'll send the broken-out notes for each topic in a response to this
message for posterity, and so folks can continue the discussion on the
list.

Like in previous years, if you have any feedback on how the
Contributor's Summit went, please feel free to share it with me here, or
off-list.

I look forward to seeing everyone at future Contributor's Summit events
in the future!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [NOTES 01/11] SHA-256 and interoperability work
  2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
@ 2025-10-06 19:18 ` Taylor Blau
  2025-10-06 19:18 ` [NOTES 02/11] First-class conflicts in Git? Taylor Blau
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Taylor Blau @ 2025-10-06 19:18 UTC (permalink / raw)
  To: git

Topic: SHA256 and interoperability work
Leader: brian
10:15am-10:45am PT

* lot of work to do
* brian is working on it
* it's progressing, not sure if we can get everything done by 3.0
* how to deal with submodules
	 * you can produce a split history
	 * accept, document, ?
	 * we need to have mapping on server or client
	 * if someone pushed one commit in sha1 and a different in 256, we can end up
		 with divergent histories that could produce security issues
	 * some private repos for open-core type submodules make this difficult with
		 submodules
	 * could have the server query, client derive mapping
	 * server could also be malicious
* if you're converting, how does that work in gpg signatures?
	 * we have a way to map both signatures
	 * if you're in compatibility mode, it will produce signatures for both
	 * what about for older histories, how can it be verified if it's only valid
		 for sha1?
			* it can be verified but can't be resigned
	 * for converting, can that work?
			* converting will retain the sha1 signature
* what is the simplest user journey?
	 * I have a clone of a repo in sha1, am I expected to run a conversion locally
		 and then I can talk to GH in 256 protocol?
			* you will create a new repo with 256 with sha1 compatibility and clone
				into that, which will convert it into both algo
			* download the data again?
				 * clone it to another directory locally
				 * it will preserve the sha1 repo and create the compatibility layer
			* let's say the local one has a submodule, clone locally including the
				submodule?
				 * yes, the conversion script will convert the submodule as well and
					 you'll have both ids
			* if I do a fetch, which do I need
				 * you need a mapping if you're talking to a server with the other algo
			* the mapping is only needed for the server if it wants to be forward
				facing?
			* with mapping, its only commits or all objects
				 * all objects
			* if someone trusts github, they can just consume it's mapping?
				 * the server and client will do their own mapping
* what happens if nobody has the submodule anymore? commit from 10 years ago but
	nobody has that submodule anymore, how do you make a 256 tree out of that
	 * pick one at random it doesnt matter
			* but you can't match everyone else
	 * we've chosen to use divergent history in this case
	 * Same issue exists with LFS objects
* if you have the old submodules,
* recursive/cyclic submodules?
	 * it's something we need to handle, don't have a great plan but it could be
		 done
	 * plan is to maybe have some pool
			* you have to convert the submodule up until that point, then convert them
				piecewise
* have you thought about mix/match where one uses sha1 and the other uses 256
	 * we can't distinguish the size of the object id vs filename
* right now you're doing the work, are you thinking of allowing another hash
	algo without having these issues again?
	 * the way the design works now is that we have two algos - main and
		 compatibility, but designed to accept multiple algos. if we switch to 3512
		 at some point for example, we could add another compat algo - it's some
		 work but the approach doesn't assume much about the specific algorithm
* steiny thought it could be useful to add a third algo not for security but
	speed
	 * gh has the insecure non crypto varients
	 * problem is always client support
	 * corporate controlled repo often also has control of the clients - so maybe
		 less of a security issue but depends
* can you put a sha1 link inside a 256 tree
	 * maybe an extra bit in the mode, some other interesting horrible thoughts
	 * would it make submodule problems go away if you could just carry the other
		 forever until the downstream decides to switch
	 * solves the submodule problem but not LFS problem?
			* LFS might be easier, you don't need to have the object to convert yours
			* assuming you have the object still
	 * brian not 100% against it
			* if I could do a 256 repo with a 256 submodule, you could parse it back,
				but if you do that, it's a different size and not usable by older
				versions of git
	 * if we were clever, sha1 trees hold sh1, 256 holds 256 and only when you
		 have a sha1 tree inside a 256 that we would use some new format
			* the problem is you still end up with stuff that doesn't work with older
				versions
			* degrades gracefully like a mode bit, worse case is that it checks out
				weird filenames?
			* write it out, take it to the list
	 * we discussed upgrading the tree object format, but it's so tight

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [NOTES 02/11] First-class conflicts in Git?
  2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
  2025-10-06 19:18 ` [NOTES 01/11] SHA-256 and interoperability work Taylor Blau
@ 2025-10-06 19:18 ` Taylor Blau
  2025-10-06 19:18 ` [NOTES 03/11] The future of history rewriting - rebase, replay and history (+Change-IDs) Taylor Blau
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Taylor Blau @ 2025-10-06 19:18 UTC (permalink / raw)
  To: git

Topic: First class conflicts
Leader: Martin Z
10:50am-11:15am PT


* how interested is Git in adopting first-class conflicts?
* can rebase descendants easily
* maybe we can use it only internally during rebasing merge commits?
* could mean you don't have to do rebase --continue etc if we expose it to users
	in the future. is it appealing?
* taylor: what's the goal of having first-class conflicts in git? do we want to
	enable certain jj-like workflows or is there another reason?
* elijah: would like first-class conflicts so i can save context while editing
	changes in a stack, to handle later or hand off conflict resolution to
	collaborators.
	 * really helpful to be able to divide and conquer when dealing with a massive
		 merge conflict
	 * so we want to be able to publish conflicts to the server for exchange? ->
		 eventually, yes
* first-class conflict means - it's a separately stored commit header that is
	understood more deeply by git (e.g. fsck). jj uses a special header on trees +
	OS-hidden dirs, it's a convention to store conflicts into .left/ .right/ etc.
	has some human-readable warnings in the commit that stores the conflicts.
	 * then the client refuses to push the special header
	 * jj puts it into this special tree because the conflict needs to not get
		 GC'd, if Git learned how to not GC those conflict objects on its own jj/but
		 would have less magic to do
* should the conflict objects really live forever?
	 * They should live as long as the commit referencing them lives
* what about adding a non-tree object for the conflict markers? e.g. add it to
	the commit header as a confict object instead
	 * having the conflicts in the commit header is nice because you don't have to
		 walk the whole commit tree to find whether there are conflicts
	 * the commit object just then starts having 3+ trees instead of 1 tree
* brian: what about a special tree object with file mode that indicates it's a
	weird special thing, then treat that specially in clients that are aware of
	what it's for?
	 * does this fit into the way we might extend the tree for storing weird
		 gitlink sha256 things also?
* what does it look like for merge commits? "i meant to merge commits aaa and
	bbb and it didn't work, try me again"
	 * then just hang onto that conflicted state, other tools could resolve it, or
		 a rebase later could resolve it
	 * partial resolution - apply as much as possible, then only write down the
		 still-unresolved parts
* how to keep people from submitting conflicts? conflicts as a first-class
	object makes it easier to prevent (or to render correctly on the client if it
	was submitted)
* would including this in git make so many git commands obsolete?
	 * elijah already working on dropping rebase and starting over with replay
	 * new commands means we can also make the UX not suck this time around
	 * patrick: same thing for git history
	 * junio: clapping emoji 🙂
* any concerns with first-class conflicts?
* is it possible to commit those in history and work on top of them
	incrementally, so subsequent commits fix only part of the original first-class
	conflicts?
	 * that's how jj works already
	 * with binary files it's hard to do conflict markers, that's not an issue
		 inherent to the conflict marker storage method though
			* in jj we stick conflict markers inside the binary. it's…. not great….
	 * for many-sided conflicts we use more types of conflict markers, even on
		 binaries
			* iteratively removing one side of the multi-side conflict until there's
				only a simple conflict or no conflict at all
			* this requires you to have an appropriate merge tool to resolve binary
				conflicts 🙂
* sounds like no broad opposition
	 * should we aim for 3.0?
	 * is it possible for people using git without the first-class conflicts to
		 keep using it the same way, if the git binary supports it?
			* not having these conflict objects be pushable makes this much easier
			* could mean that initially we can't mail those conflicts around and we
				use a major release to make it possible to ship them
	 * patrick: please be careful putting too many things onto 3.0 gating, so we
		 can actually finish 3.0 🙂 should we stick to things that are already ready
		 or at least underway?
			* taylor: i think it depends on if we think it can land in the next <10
				months
			* local-only means it probably doesn't need to be behind a major/breaking
				release
	 * people can share conflicts via continuous sync (not through git protocol)
		 in the meantime, with other tooling
	 * would be nice to get branch-level acceptance of conflict objects from the
		 server side
* helps to understand what the target format should be. if we did it, what would
	it look like? then we can start working on it
	 * on the list let's figure out what it should look like, and then we can
		 start working on it but not in a breaking way. then we could start to
		 notice places where it breaks old commands
	 * but how do we know the format is right before we start developing tooling
		 against it?
	 * is the object format a reversible decision?
			* maybe we can depend on jj / git butler having forged the path already a
				bit
	 * interesting to think about how tools like jj and git butler would ideally
		 want to store conflicts if they didn't have to worry about wedging it into
		 git's current formats
			* is the path to getting first-class to start by wedging it into git? that
				seems to be what jj and git butler have been doing already; are we ready
				to move into git first class?
			* "first-class" is in the eye of the client, so we're talking about the
				way to make them first class to git, not first class to wrappers (who
				already know how to do their own first class thing)
	 * needs to store the tree and not gc it, any other reqs?
* how does the plan work?
	 * lock in data format, then only git replay can work with it, everybody else
		 ignores it?
	 * would be very difficult to teach rebase/cherrypick to understand these
		 without breaking for people who use git the way they do now
	 * or could put a flag to fork rebase into a different handler if it sees a
		 conflict object
	 * cherry-pick already has(?) a replay mode (or maybe just in elijah's tree)
	 * scripting support becomes weird if you're using config flags to change
		 behavior of porcelain that already exists

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [NOTES 03/11] The future of history rewriting - rebase, replay and history (+Change-IDs)
  2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
  2025-10-06 19:18 ` [NOTES 01/11] SHA-256 and interoperability work Taylor Blau
  2025-10-06 19:18 ` [NOTES 02/11] First-class conflicts in Git? Taylor Blau
@ 2025-10-06 19:18 ` Taylor Blau
  2025-10-06 19:18 ` [NOTES 04/11] Rust Taylor Blau
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Taylor Blau @ 2025-10-06 19:18 UTC (permalink / raw)
  To: git

Topic: The future of history rewriting
Leader: Phillip Wood
11:30am-12:00pm PT

* Number of methods of history rewriting
* What do we want the future UI and operations to look like and be easy
	 * Want good commit histories
* JJ always in Elijah's edit mode demo, always interactive rebase.
	 * Always easy to rewrite commits.
	 * Always rebase descendants
	 * Use commands instead of verbs on rebase command
	 * Git would have a hard time adopting the ‘always rebasing' model
* For new users, git rebase is too complicated for simple use cases.
	 * Top level commands to easily do common operations
			* ‘Git history'?
	 * Git history vs git replay
			* Replay is plumbing used by server.
			* History is porcelain used by user.
* Elijah - Building on rebase depends on sequencer
	 * Underpinnings more important than naming
	 * Lots of backwards compatibility assumptions
	 * Hooks and dependencies are pervasive, hard to clean up.
	 * When update-refs was added, it broke hooks
			* Don't want to keep doing that.
	 * Wants to move beyond sequencer (git history uses sequencer)
	 * Sequencer frequently updates the work tree, not desirable
	 * Git history can move to use Git replay
* A cleaner version of git history would be nice so others can try it out
	 * Git replay is lacking features needed for git history
	 * Trying landing experimental version with sequencer underpinnings
			* No promise of compatibility
* Phillip - users noticed when he broke Sequencer Hooks
	 * Disable hooks with flags?
* Way forward - land UI then iterate on underpinnings?
* Sequencer depends on shell parseable state files
	 * Lots to cleanup
* Minh - does this help solve problem of server rewriting history (ie force
	push), leaving clients with incompatible forks?
	 * Out of scope
	 * Maybe change id is the more relevant conversation
			* Conversation ended up on ChangeID
	 * Change ID loses predecessor tracking, which is more precise
			* Hard to propagate without Mercurial style logs
			* Mercurial predecessor graphs are independent of commits
	 * Change IDs would also help with first class conflicts
			* Finding range-diffs is cheaper
	 * Range-diffs used fairly widely
			* Git, rust, most mailing list flows
	 * Change IDs useful for tracking across repos, bugs, etc.
* Why are change IDs stalled in the mailing list?
	 * Disagreement on tracking predecessors
	 * Requires a protocol change
	 * Sending predecessors over protocol has lots of implications
	 * Gitster - disagreement on what it means to be a predecessor
			* Parent? Cherrypick?
	 * Brian - changeId should be deterministic. Reject non-well formed ids
			* Workflows rely on repetition
			* ChangeIds should be optional, disableable.
			* May track too much information unintentionally across commits, projects.
			* Gitster - needs to be possible to expose changeId, predecessor without
				exposing private information about private repos.
			* ChangeID exposes less than predecessors do
	 * JJ can't access predecessors from ChangeID
			* When rewriting commits, maybe we don't want the predecessor to be
				viewable (eg secret keys)
			* JJ can bump changeId when rewriting
	 * Gerrit keeps ChangeID in commit body
			* Rebase and Cherrypick don't support arbitrary key:value pairs in commit
				body
	 * ChangeID should propagate to be useful
			* Eg across mailing list
			* Can Git more generally and globally support headers in the commit?
			* ChangeID should be more 1st class than other headers.
				 * Hard for client to tell when a ChangeID should change.
	 * Recent JJ commits were pushed with ChangeIDs
			* Colleague branched off. Rewriting ids would have been useful.
			* Squashes, amends etc lead to ambiguity about which ChangeID to keep.
				 * JJ keeps the parent.
			* Gitster thinks it would be nicer for ChangeIDs to be kept even when
				there are 2.
			* When commits split, the children get 2 new ChangeIDs instead of keeping
				old one.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [NOTES 04/11] Rust
  2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
                   ` (2 preceding siblings ...)
  2025-10-06 19:18 ` [NOTES 03/11] The future of history rewriting - rebase, replay and history (+Change-IDs) Taylor Blau
@ 2025-10-06 19:18 ` Taylor Blau
  2025-10-06 19:19 ` [NOTES 05/11] Pluggable object databases Taylor Blau
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Taylor Blau @ 2025-10-06 19:18 UTC (permalink / raw)
  To: git

Topic: Rust
Leader: Patrick Steinhardt
13:05am-13:30pm PT


* Recurring topic from past years, but sparked again by Ezekiel's contributions
	on xdiff
* We're favorable towards it, but we haven't previously agreed on a timeline
* Platforms that don't have Rust support: NonStop, Alpha, Cygwin, and some
	others brought up by Gentoo
* Patrick has a series up to let us provide notification to users that Git will
	start depending on Rust
* Led to lots of discussion both on the mailing list and outside, which had the
	good effect of making more people aware of the upcoming change
* Ezekiel is trying to pass some of the blame to a big brother – he's happy to
	take it ;-)
* Ezekiel is more interested in the technical details than the policy details,
	though we need the policy details figured out
* Having Rust be optional leads to code being written twice and increasing the
	maintenance support; having mandatory Rust support is needed to avoid that
* brian wrote sha256 interop code in Rust
* Would be nice to hand over maintenance for some kind of (Rust-optional) LTS
	release to someone else in the community
* We have lots of global state that we need to get rid of, and lots of other
	cleanup
* Long term goal may be to eventually replace all of C, though it's not clear if
	we should take that whole goal or just start with pieces that make sense.
	Also, we've got a learning process ahead of us, so our goalposts may need to
	change as we learn.
* Rust might be helpful for libification reasons, but tying libification to an
	already big change might make it too big
* Rust rewrite could mean implementing new subcommands (as discussed earlier) in
	Rust instead of rewriting bug-for-bug existing code
* There are lots of updating that can be done before switching to Rust, e.g.
	switching to unambiguous types
* Rust can be used to replace things at an individual function level
* Just rewriting in Rust doesn't turn the existing system into nice abstraction
	boundaries or reusable modules.  We have existing efforts to try to clean
	those up in various ways, e.g. the pluggable object store work.
* Rust makes unit tests much easier and ergonomic, and starting by writing tests
	of existing C code makes a lot of sense as a way to begin a migration.
* Large organizations and governments are going to start pushing for people to
	move away from C for security reasons.
* Major reason(s) to adopt Rust
	 * Threading
	 * Error propagation
	 * Difficult to know who owns what in C - Rust improves maintainability
	 * Attracting more contributors (it's the most popular according to
		 StackOverflow)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [NOTES 05/11] Pluggable object databases
  2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
                   ` (3 preceding siblings ...)
  2025-10-06 19:18 ` [NOTES 04/11] Rust Taylor Blau
@ 2025-10-06 19:19 ` Taylor Blau
  2025-10-06 19:19 ` [NOTES 06/11] Repository maintenance long-term goals Taylor Blau
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Taylor Blau @ 2025-10-06 19:19 UTC (permalink / raw)
  To: git

Topic: Pluggable object databases
Leader: Patrick Steinhardt

* Already working towards, since git 2.50.
* Allow innovation on the server side on large binary.
* The design will soon be up for discussion.
* Allow migration between different object format, and allow to be picked later
	by the implementer.
* The planned work is to make the new db more pluggable, right now the work is
	still about refactoring. 2.53 will have a proof of concept. Might take into
	the second half of 2026 to be done.
* Blocker1: The current db format is still not clear. Particularly latency perf
	related issues.
	 * Might be using content chunking hashing, might be using existing db impl
		 like cassandra.
* Blocker2: Second problem is how to generate the packfile.
* Taylor wonder whether we can reuse the current object db, but patrick thinks
	the current impl is too large/complex to adopt. The current refactoring effort
	with better abstraction might speed up future changes.
* Gitster wonders whether we can just use the hash of the chunks' hashes.
* Taylor also thinks a new obj db might become just as complex.
* Patrick thinks the new obj db can be more maintainable. Starting off with a
	brand new abstraction allows faster iteration.
* Rewriting obj db in a new world might be challenging because the pack obj is
	so intimate to so many usage and optimizations (e.g. bitmap), also the need to
	identify big binary obj over the wire.
* Taylor thinks maybe we don't need to rewrite pack obj, but abstracting the
	packfile could make it worse and more verbose.
* Patrick mentions there's already many other adjacent projects abstract away
	from the pack format; e.g. jgit, libgit2. Jgit initially already identified
	Casadra's perf would never work due to latency overhead.
* Taylor suggests we identify a proof of concept with comparable latency to
	existing obj db before doing additional refactoring.
* Ezekiel is refocusing the discussion on targeting large binary files. Maybe
	with large binary files, latency degradation is not as important.
* In git, we already have a divergent code path for large binary files, we just
	chose to store them in the packfile, technically people can change the storage
	selection without refactoring.
* Patrick still thinks having sub-system abstraction would make code more
	maintainable.
* Taylor is supportive about some objects can use the current db vs only have
	the large binary files to use the new db; at least we don't impose the
	overhead over all objects.
* The obj chunk design Patrick proposing is meant to benefit both client side
	storage and server side.
* We should resume this discussion with more concrete usage, right now we are
	still talking about potential scenarios.
* The premisor feature from server side cannot satisfy all clients, since some
	clients don't want to use premisor, so the server side might still be expected
	to have the large binary files on disk.
* The packfile url might still be the main direction we can use to fix the large
	binary issue without doing exploding obj chunking.
* Another benefit of obj chunking is to reduce hash time for large binary files.
	Gerrit currently sees 50% of clone time is due to hashing. Parallel hashing is
	also possible with obj chunking.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [NOTES 06/11] Repository maintenance long-term goals
  2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
                   ` (4 preceding siblings ...)
  2025-10-06 19:19 ` [NOTES 05/11] Pluggable object databases Taylor Blau
@ 2025-10-06 19:19 ` Taylor Blau
  2025-10-06 19:19 ` [NOTES 07/11] Change-ID Header in Git Taylor Blau
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Taylor Blau @ 2025-10-06 19:19 UTC (permalink / raw)
  To: git

Topic: Repository maintenance long-term goals
Leader: Taylor Blau

* Taylor's talk was limited towards the end. Could expand on that future work.
* Constant repacking into a single pack was historically the major problem.
* Doing that less (because of geometric repacking) helps, but it's still a
	potential issue when it does occur. Gets them 98% of the way.
* Future items were geometric reachability, ?, best effort gc
* Previously during geometric, used to accumulate loose objects too. 6 months
	ago they changed to an approach where the big cruft pack could be excluded
	from the midx.
* Challenge would be to do a full complete repack without rewriting all of the
	midx chain.
* Because bitmap is tied to object order in a pack, need something like
	tombstones to not break the bitmaps. Need the tombstone to know that we don't
	have the data.

* Unitary midx idea - Taylor designed the chained midx before he figured out the
	repacking strategy. MIDX and pack index duplicate the data. No reason to
	de-dup other than for space saving. Could even skip having idx, but plenty of
	old git versions can't read midx.
* brian - there may be other implementations, such as git lfs, that don't use
	midx and object id mappings in pack idx v3 aren't supported in midx either.

* Nothing preventing you from having two parallel repacks, one that's geometric
	and one that's trying to do an all-into-one.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [NOTES 07/11] Change-ID Header in Git
  2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
                   ` (5 preceding siblings ...)
  2025-10-06 19:19 ` [NOTES 06/11] Repository maintenance long-term goals Taylor Blau
@ 2025-10-06 19:19 ` Taylor Blau
  2025-10-06 19:20 ` [NOTES 08/11] Resumable fetch / push Taylor Blau
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Taylor Blau @ 2025-10-06 19:19 UTC (permalink / raw)
  To: git

Topic: Change-ID Header in Git
Leader: Philip Metzger


* How do we store the Change-ID? Store it in a header? Some auxiliary metadata
	store?
* Happens to work in a header for GitHub because they survive rebases since
	GitHub uses replay, not all forges do this.
* Want a standard interoperable way to associate Change-IDs with commits.
* Storage discussion has largely been covered.
* Taylor: what's less clear to me is the semantics of when we keep Change-IDs
	across operations, when we assign new ones.
	 * Cherry-picking equivalent assign a new Change-ID
	 * Almost everything else retains that Change-ID
* Taylor: we need to agree on the storage, but not necessarily on the semantics
	of when we keep versus assign new Change-IDs.
* Caleb: Assigning a new Change-ID when cherry-picking is interesting, since we
	(GitButler) retain those.
* Philip: Gerrit does the same thing, but JJ does something differently. Their
	approach was to have an optional header that describes the “origin” (in some
	sense) of the commit.
* Caleb: I wonder if the semantics are important if we are trying to use these
	in the same sandbox?
	 * Taylor: we need to understand and agree on them when we are working on the
		 same repository (regardless of using the same tool), but not in general at
		 the tool level.
* What's the next step?
* Martin: experiment with it, see if we like the semantics. Don't want to
	emphasize the divergence table.
* Taylor: do we need a version associated with the change-id? Philip: no, we
	treat it as an opaque identifier, versioning not necessary.
* Elijah: given that multiple players want this and have agreed on a common way
	to represent it, maybe we'll have a more productive discussion on it in a year
	after they've experienced working with that header for a year
* Jonathan: does it matter what forges do with automatic squash/rebase?
	 * Philip: for JJ we don't want to use that information, but we're just
		 another Git client in the ecosystem, so that's just our perspective.
* Martin: Should there be agreement on the semantics?
	 * Elijah: depends on the usage.
* Elijah: semantics get fuzzy because of splitting and merging, so not clear
	what to do there. We either need to clarify it, but probably not here.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [NOTES 08/11] Resumable fetch / push
  2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
                   ` (6 preceding siblings ...)
  2025-10-06 19:19 ` [NOTES 07/11] Change-ID Header in Git Taylor Blau
@ 2025-10-06 19:20 ` Taylor Blau
  2025-10-06 19:20 ` [NOTES 09/11] Git 3.0 Taylor Blau
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Taylor Blau @ 2025-10-06 19:20 UTC (permalink / raw)
  To: git

Topic: Resumable fetch/push
Leader: Caleb (was Scott, but he's not here)

* Is this only client side or server side too?
	* Applies to both as GitButler has a forge too. Would be nice to have protocol
		improvements.
* Both bundle-uris and packfile-uris exist and at least packfile-uris are
	resumable. Both are fetch-only, so push is unsolved.
* Could use single-threaded output or server-side caching to make pushing work.
* Maybe make it so servers could receive a bundle and make that resumable.
* Use cases: Pushing a repo for the first time to a new server, once there's
	good large file support, android/chromium. Also a problem that's independent
	of size in environments with poor connectivity (some countries, Caltrain, …).
* Servers could hand out some kind of opaque data with the fetch to indicate
	what it has cached, clients can re-share that when attempting to resume and
	the server can choose to do something with it or not.
* GitHub support has told people to create a branch with N commits at a time to
	fetch.


Scrambly notes (Jack's notes):


* Specific Forge implementation, http based communication -> easier to set up,
	keen on improvement to protocol that allows large pack files sent between
	client and server
* For packfile uris at least the pack file part that is in the uri is already
	reasonable, for bundle url's may not be the same, might be low handing fruit
* Taylor: push side more interesting: server -> already sent you first m bytes
	of x, need something to send the resumable push
* Consider implications as an attack vector
* Brian: git's pack implementation is deterministic if you don't do
	multithreading, could use returnable mode like gzip has unsyncable mode, for
	client side pack a temporary file, this is resumable with an offset, and since
	pack is cached locally should be something you could resume with push. Some
	possibilities if we cache on the server side or use single threaded output
* an idea from pack file ui which could help solve fetch problem, server provide
	url to the client, let the server be the fetcher
* Emily: that would work pretty ok using a commit cloud server, already serving
	those objects. The server side can resume necessarily.
* Servers don't receive bundles, so would be adding support for server to
	receive bundles. What's the real use case for this? It's worth it's own
	protocol, not just a push protocol. When we try to mirror things in Gerrit it
	fails due to large number of refs - would need an enhancement to handle large
	numbers of refs.
* Caleb: So you suggest some sort of TCP protocol for handling these transfers?
* We have user stored binary and timeout uploading to server, it's not just
	migration path
* Having some way of guaranteeing forward progress on a push or a pull as long
	as you can get some smaller unit of data transfer, don't know how small to go,
	but would be very useful
* We talked about chunk format before, would introducing chunk format, small
	enough chunks help?
* If it's small enough and reproducible
* Elijah: Even if you have small chunks, if they are part of the same
	communication, if they're small enough you'll need to restart it
* If you have to resume now say you have sent X chunks then you have N - X left
* Peff: All you need to know is the byte offset.
* Elijah: Take the objects that you have received and say "I have these objects"
* What if you hash what you got, "I asked for this", the hash was this length,
	give me the rest
* Peff: Has to be able to regenerate everything from scratch, are you caching
	it? Kindof wasteful
* Doesn't need to be cached, just needs to be stable, so if there was a way to
	ask for it in a specific order
* Disable multithreading


* Peff: Looked into this with resumable clones, server can pass out some cache
	tag, here's an opaque tag that may or may not be valid in the future, I got X
	bytes of this tag can you send the rest. Becomes a heuristic on the server
	"I'll choose how much to cache", git doesn't need to know about that it's an
	implementation issue
* With a pack file uri you stop what you're doing talking to the server
* If you were trying to brute force it today, you would brute force sending a
	ref
* Peff: GitHub support has told people to do that

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [NOTES 09/11] Git 3.0
  2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
                   ` (7 preceding siblings ...)
  2025-10-06 19:20 ` [NOTES 08/11] Resumable fetch / push Taylor Blau
@ 2025-10-06 19:20 ` Taylor Blau
  2025-10-06 19:20 ` [NOTES 10/11] How can companies respectfully engage contractors to work on Git? Taylor Blau
  2025-10-06 19:20 ` [NOTES 11/11] Conservancy 2025 updates Taylor Blau
  10 siblings, 0 replies; 12+ messages in thread
From: Taylor Blau @ 2025-10-06 19:20 UTC (permalink / raw)
  To: git

Topic: Git 3.0
Leader: Patrick Steinhardt


* Any questions?
* Emily: Patrick, you had proposed the end of next year as the cut date, are
	people happy with that?
* Taylor: we've been using that internally as a benchmark for when we need to
	deliver SHA-256. If Git 3.0 came later and we had more time, certainly
	wouldn't complain, but also wouldn't ask the project to push it back purely on
	that basis.
* Caleb: we need community support for SHA-256.
* Emily: feels like everybody is playing chicken.
* Taylor: ultimately the users need to tell us.
* Patrick: is there work going on on GitOxide?
* Caleb: nobody is asking for it from GitButler's perspective
* Elijah: if we don't push a date out, nobody will ask for it.
* brian: the cost for creating a SHA-1 collision is roughly $10k USD. Don't want
	to spend my bonus check on it, but could do it and spam us with alerts.
* Taylor: sure, but we could just silence those alerts. Also, who would spend
	$10k on this? ;-)
* brian: fair, though not implementations are using a SHA1-DC?
* brian: we should include this in Git 3.0, and we should set a hard date for
	it. We should plan the interop work around that, but can't guarantee that it
	will land by then.
* Martin: what's in scope for Git 3.0?
	 * Elijah: SHA-256 (and maybe interop) is the main thing, some deprecations
	 * Patrick: we have a BreakingChanges that lists what we want to remove.
		 Default reference backend is going to become reftable.
* Taylor: we should be doing brown-outs for deprecated features
* Elijah: we should delay for interop
* Peff: how important is interop really? What is the use-case?
* Elijah: will forges actually support SHA-256 once we enable it? Do we have
	people create SHA-256 and then have them not push them anywhere.
* Peff: how do we push forges versus not?
* Peff: When we release Git 3.0 should not depend on whether or not interop
	works, but whether or not real-world forges and plugins support SHA-256
* brian: smaller forges aren't there yet and won't undertake it until it's in
	3.0
* Taylor: sure, but not the vast majority of users. Ultimately there are always
	going to be some stragglers. Reality is that what “we” consider to be Git and
	the rest of the world consider to be Git are not the same thing. So if we
	release without good support on the forges side, users will be mad at us.
* Let's figure it out on the list?
* brian: I'll start that off.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [NOTES 10/11] How can companies respectfully engage contractors to work on Git?
  2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
                   ` (8 preceding siblings ...)
  2025-10-06 19:20 ` [NOTES 09/11] Git 3.0 Taylor Blau
@ 2025-10-06 19:20 ` Taylor Blau
  2025-10-06 19:20 ` [NOTES 11/11] Conservancy 2025 updates Taylor Blau
  10 siblings, 0 replies; 12+ messages in thread
From: Taylor Blau @ 2025-10-06 19:20 UTC (permalink / raw)
  To: git

Topic: How can companies respectfully engage contractors to work on Git?
Leader: Emily Shaffer


* Google hired Collabra to work on patches on the list
* Should they be doing something specific to indicate they're pursuing these
	patches on behalf of someone else?
* Taylor: So long as they understand there's no obligation from the project to
	accept the work
* Having a short note in the cover letter to indicate who is sponsoring the work
	(if it's not already obvious). Mention during review if you think there's a
	conflict of interest.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [NOTES 11/11] Conservancy 2025 updates
  2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
                   ` (9 preceding siblings ...)
  2025-10-06 19:20 ` [NOTES 10/11] How can companies respectfully engage contractors to work on Git? Taylor Blau
@ 2025-10-06 19:20 ` Taylor Blau
  10 siblings, 0 replies; 12+ messages in thread
From: Taylor Blau @ 2025-10-06 19:20 UTC (permalink / raw)
  To: git

Topic: Conservancy 2025 updates
Leader: Taylor Blau


* More trademark requests than typical this year
* Asked Perforce to stop using a very similar logo to the git logo
* Git holds a fairly restrictive trademark policy, but often doesn't enforce it.
	Some risk the trademark office could flag that.
* Git project has a significant amount of money that could be spent ($100k?).
	 * Emily: Could sponsor git-related projects (ex: gitoxide)
* Outreachy costs money per-intern
	 * Not guaranteed that GitHub or GitLab would always be able to sponsor all
		 the interns the Git project desires. Could use $ for this. Also depends on
		 the future of Outreachy.
* Git ambassador program, with stipends?
	 * Needs someone with interest and skills to organize

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-10-06 19:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-06 19:16 Notes from the Git Contributor's Summit, 2025 Taylor Blau
2025-10-06 19:18 ` [NOTES 01/11] SHA-256 and interoperability work Taylor Blau
2025-10-06 19:18 ` [NOTES 02/11] First-class conflicts in Git? Taylor Blau
2025-10-06 19:18 ` [NOTES 03/11] The future of history rewriting - rebase, replay and history (+Change-IDs) Taylor Blau
2025-10-06 19:18 ` [NOTES 04/11] Rust Taylor Blau
2025-10-06 19:19 ` [NOTES 05/11] Pluggable object databases Taylor Blau
2025-10-06 19:19 ` [NOTES 06/11] Repository maintenance long-term goals Taylor Blau
2025-10-06 19:19 ` [NOTES 07/11] Change-ID Header in Git Taylor Blau
2025-10-06 19:20 ` [NOTES 08/11] Resumable fetch / push Taylor Blau
2025-10-06 19:20 ` [NOTES 09/11] Git 3.0 Taylor Blau
2025-10-06 19:20 ` [NOTES 10/11] How can companies respectfully engage contractors to work on Git? Taylor Blau
2025-10-06 19:20 ` [NOTES 11/11] Conservancy 2025 updates Taylor Blau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).