* [PATCH 0/2] Add reftable by default as a breaking change
@ 2025-07-02 10:14 Patrick Steinhardt
2025-07-02 10:14 ` [PATCH 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt
` (3 more replies)
0 siblings, 4 replies; 21+ messages in thread
From: Patrick Steinhardt @ 2025-07-02 10:14 UTC (permalink / raw)
To: git; +Cc: brian m. carlson, Karthik Nayak, K Jayatheerth, ryenus,
Junio C Hamano
Hi,
the recent thread at [1] motivated me to hack together this tiny patch
series that paves our path towards making the reftable backend the
default backend. It does two things:
- It announces the breaking change for Git 3.0.
- It makes it the default now already when "feature.experimental" is
enabled.
The first item is subject to ecosystem support, most notably in
libraries like Gitoxide, libgit2 and JGit. The second item is intended
to extend the user base to power users so that we get more test exposure
out in the wild before we make it the default in Git 3.0.
Thanks!
Patrick
[1]: <xmqqtt3vkhwk.fsf@gitster.g>
---
Patrick Steinhardt (2):
BreakingChanges: announce switch to "reftable" format
setup: use "reftable" format when experimental features are enabled
Documentation/BreakingChanges.adoc | 39 +++++++++++++++++++++++++++++
Documentation/config/feature.adoc | 6 +++++
setup.c | 18 ++++++++++++++
t/t0001-init.sh | 50 ++++++++++++++++++++++++++++++++++++++
4 files changed, 113 insertions(+)
---
base-commit: 83014dc05f6fc9275c0a02886cb428805abaf9e5
change-id: 20250702-pks-reftable-default-backend-6c30f330250a
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH 1/2] BreakingChanges: announce switch to "reftable" format 2025-07-02 10:14 [PATCH 0/2] Add reftable by default as a breaking change Patrick Steinhardt @ 2025-07-02 10:14 ` Patrick Steinhardt 2025-07-02 17:03 ` Junio C Hamano 2025-07-02 17:17 ` Justin Tobler 2025-07-02 10:14 ` [PATCH 2/2] setup: use "reftable" format when experimental features are enabled Patrick Steinhardt ` (2 subsequent siblings) 3 siblings, 2 replies; 21+ messages in thread From: Patrick Steinhardt @ 2025-07-02 10:14 UTC (permalink / raw) To: git; +Cc: brian m. carlson, Karthik Nayak, K Jayatheerth, ryenus, Junio C Hamano The "reftable" format has come a long way and has matured nicely since it has been merged into git via 57db2a094d5 (refs: introduce reftable backend, 2024-02-07). It fixes longstanding issues that cannot be fixed with the "files" format in a backwards-compatible way and performs significantly better in many use cases. Announce that we will switch to the "reftable" format in Git 3.0 for newly created repositories. This switch is dependent on support in the larger Git ecosystem. Most importantly, libraries like JGit, libgit2 and Gitoxide should support the reftable backend so that we don't break all applications and tools built on top of those libraries. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- Documentation/BreakingChanges.adoc | 39 ++++++++++++++++++++++++++++++++++++++ setup.c | 6 ++++++ t/t0001-init.sh | 16 ++++++++++++++++ 3 files changed, 61 insertions(+) diff --git a/Documentation/BreakingChanges.adoc b/Documentation/BreakingChanges.adoc index c6bd94986c5..c96b5319cdd 100644 --- a/Documentation/BreakingChanges.adoc +++ b/Documentation/BreakingChanges.adoc @@ -118,6 +118,45 @@ Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@zombino.com>, <20170223155046.e7nxivfwqqoprsqj@LykOS.localdomain>, <CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47EQDE+DiUQ@mail.gmail.com>. +* The default storage format for references in newly created repositories will + be changed from "files" to "reftable". The "reftable" format provides + multiple advantages over the "files" format: ++ + ** It is impossible to store two references that only differ in casing on + case-insensitive filesystems with the "files" format. This issue is + especially common on Windows, but also on older versions of macOS. As the + "reftable" backend does not use filesystem paths anymore to encode + reference names this problem goes away. + ** Similarly, macOS normalizes path names that contain unicode characters, + which has the consequence that you cannot store two names with unicode + characters that are encoded differently with the "files" backend. Again, + this is not an issue with the "reftable" backend. + ** Deleting references with the "files" backend requires Git to rewrite the + complete "packed-refs" file. In large repositories with many references + this file can easily be dozens of megabytes in size, in extreme cases it + may be gigabytes. The "reftable" backend uses tombstone markers for + deleted references and thus does not have to rewrite all of its data. + ** Repository housekeeping with the "files" backend typically performs + all-into-one repacks of references. This can be quite expensive, and + consequently housekeeping is a tradeoff between the number of loose + references that accumulate and slow down operations that read references, + and compressing those loose references into the "packed-refs" file. The + "reftable" backend uses geometric compaction after every write, which + amortizes costs and ensures that the backend is always in a + well-maintained state. + ** Operations that write multiple references at once are not atomic with the + "files" backend. Consequently, Git may see in-between states when it reads + references while a reference transaction is in the process of being + committed to disk. + ** Writing many references at once is slow with the "files" backend because + every reference is created as a separate file. The "reftable" backend + significantly outperforms the "files" backend by multiple orders of + magnitude. ++ +A prerequisite for this change is that the ecosystem is ready to support the +"reftable" format. Most importantly, alternative implementations of Git like +JGit, libgit2 and Gitoxide need to support it. + === Removals * Support for grafting commits has long been superseded by git-replace(1). diff --git a/setup.c b/setup.c index f93bd6a24a5..3ab0f11fbfd 100644 --- a/setup.c +++ b/setup.c @@ -2541,6 +2541,12 @@ static void repository_format_configure(struct repository_format *repo_fmt, repo_fmt->ref_storage_format = ref_format; } else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) { repo_fmt->ref_storage_format = cfg.ref_format; + } else { +#ifdef WITH_BREAKING_CHANGES + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_REFTABLE; +#else + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_FILES; +#endif } repo_set_ref_storage_format(the_repository, repo_fmt->ref_storage_format); } diff --git a/t/t0001-init.sh b/t/t0001-init.sh index f11a40811f2..e0f27484192 100755 --- a/t/t0001-init.sh +++ b/t/t0001-init.sh @@ -658,6 +658,22 @@ test_expect_success 'init warns about invalid init.defaultRefFormat' ' test_cmp expected actual ' +test_expect_success 'default ref format' ' + test_when_finished "rm -rf refformat" && + ( + sane_unset GIT_DEFAULT_REF_FORMAT && + git init refformat + ) && + if test_have_prereq WITH_BREAKING_CHANGES + then + echo reftable >expect + else + echo files >expect + fi && + git -C refformat rev-parse --show-ref-format >actual && + test_cmp expect actual +' + backends="files reftable" for format in $backends do -- 2.50.0.195.g74e6fc65d0.dirty ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] BreakingChanges: announce switch to "reftable" format 2025-07-02 10:14 ` [PATCH 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt @ 2025-07-02 17:03 ` Junio C Hamano 2025-07-02 21:21 ` brian m. carlson 2025-07-03 4:43 ` Patrick Steinhardt 2025-07-02 17:17 ` Justin Tobler 1 sibling, 2 replies; 21+ messages in thread From: Junio C Hamano @ 2025-07-02 17:03 UTC (permalink / raw) To: Patrick Steinhardt Cc: git, brian m. carlson, Karthik Nayak, K Jayatheerth, ryenus Patrick Steinhardt <ps@pks.im> writes: > diff --git a/Documentation/BreakingChanges.adoc b/Documentation/BreakingChanges.adoc > index c6bd94986c5..c96b5319cdd 100644 > --- a/Documentation/BreakingChanges.adoc > +++ b/Documentation/BreakingChanges.adoc > @@ -118,6 +118,45 @@ Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@zombino.com>, > <20170223155046.e7nxivfwqqoprsqj@LykOS.localdomain>, > <CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47EQDE+DiUQ@mail.gmail.com>. > > +* The default storage format for references in newly created repositories will > + be changed from "files" to "reftable". The "reftable" format provides > + multiple advantages over the "files" format: > ++ > + ** It is impossible to store two references that only differ in casing on > ... > + ** Writing many references at once is slow with the "files" backend because > + every reference is created as a separate file. The "reftable" backend > + significantly outperforms the "files" backend by multiple orders of > + magnitude. These list benefits of using "reftable". Can we also add one point that stresses why we want to make it the default? Something like "Having to do X once per user to make them opt-in is too cumbersome" is probably good enough. > +A prerequisite for this change is that the ecosystem is ready to support the > +"reftable" format. Most importantly, alternative implementations of Git like > +JGit, libgit2 and Gitoxide need to support it. ... in order for them to access the same repository. How common is it to use a single repository from these multiple implementations these days, I have to wonder? > diff --git a/setup.c b/setup.c > index f93bd6a24a5..3ab0f11fbfd 100644 > --- a/setup.c > +++ b/setup.c > @@ -2541,6 +2541,12 @@ static void repository_format_configure(struct repository_format *repo_fmt, > repo_fmt->ref_storage_format = ref_format; > } else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) { > repo_fmt->ref_storage_format = cfg.ref_format; > + } else { > +#ifdef WITH_BREAKING_CHANGES > + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_REFTABLE; > +#else > + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_FILES; > +#endif > } > repo_set_ref_storage_format(the_repository, repo_fmt->ref_storage_format); > } That's obvious one. I think the approach taken by brian's SHA-256 topic would have introduced REF_STORAGE_FORMAT_DEFAULT and did the build-time switching between the two in a single conditional definition #ifndef WITH_BREAKING_CHANGES /* 3.0 */ # define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_FILES #else # define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_REFTABLE #endif somewhere in a header file. Either way would work, but I wonder if these breaking-changes definitions are collected together into a single header file (say <bc.h>), it may make the transition at 3.0 version boundary simpler and less error-prone. We can just discard selected conditionals into unconditional definition more easily. For example if we moved the default flip between SHA-1 and SHA-256, i.e. #ifndef WITH_BREAKING_CHANGES /* 3.0 */ # define GIT_HASH_DEFAULT GIT_HASH_SHA1 #else # define GIT_HASH_DEFAULT GIT_HASH_SHA256 #endif out of hash.h and have it next to the above REF_STORAGE_FORMAT_DEFAULT definition, and then in a subsystem specific header file, after including <bc.h>, can say === In hash.h === #include <bc.h> #ifndef GIT_HASH_DEFAULT # define GIT_HASH_DEFAULT GIT_HASH_SHA256 #endif === In refs.h === #include <bc.h> #ifndef REF_STORAGE_FORMAT_DEFAULT # define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_REFTABLE #endif If some reason making reftable backend the default when unspecified turns out to be a bit premature at 3.0 boundary while the world is ready for SHA-256 by default for new repositories, then we can tweak that single header file like so: -#ifndef WITH_BREAKING_CHANGES /* 3.0 */ +#ifndef WITH_BREAKING_CHANGES /* 4.0? */ # define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_FILES #else # define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_REFTABLE #endif -#ifndef WITH_BREAKING_CHANGES -# define GIT_HASH_DEFAULT GIT_HASH_SHA1 -#else -# define GIT_HASH_DEFAULT GIT_HASH_SHA256 -#endif and optionally change the "if default is not set, use 256" in <hash.h> to "unconditionally use 256 as the default", but forgetting to do so would not break anything, which makes the process less error prone. By doing something like this, we'll have a single place <bc.h> to see what are being planned, and we can "git log that-header-file" to see how our thinking has evolved over time. Hopefully we do not have to keep too many entries in that file and can retire the conditionals as we plan ahead. > diff --git a/t/t0001-init.sh b/t/t0001-init.sh > index f11a40811f2..e0f27484192 100755 > --- a/t/t0001-init.sh > +++ b/t/t0001-init.sh > @@ -658,6 +658,22 @@ test_expect_success 'init warns about invalid init.defaultRefFormat' ' > test_cmp expected actual > ' > > +test_expect_success 'default ref format' ' > + test_when_finished "rm -rf refformat" && > + ( > + sane_unset GIT_DEFAULT_REF_FORMAT && > + git init refformat > + ) && > + if test_have_prereq WITH_BREAKING_CHANGES > + then > + echo reftable >expect > + else > + echo files >expect > + fi && > + git -C refformat rev-parse --show-ref-format >actual && > + test_cmp expect actual > +' Obvious ;-) Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] BreakingChanges: announce switch to "reftable" format 2025-07-02 17:03 ` Junio C Hamano @ 2025-07-02 21:21 ` brian m. carlson 2025-07-03 4:43 ` Patrick Steinhardt 2025-07-03 4:43 ` Patrick Steinhardt 1 sibling, 1 reply; 21+ messages in thread From: brian m. carlson @ 2025-07-02 21:21 UTC (permalink / raw) To: Junio C Hamano Cc: Patrick Steinhardt, git, Karthik Nayak, K Jayatheerth, ryenus [-- Attachment #1: Type: text/plain, Size: 3479 bytes --] On 2025-07-02 at 17:03:25, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > diff --git a/Documentation/BreakingChanges.adoc b/Documentation/BreakingChanges.adoc > > index c6bd94986c5..c96b5319cdd 100644 > > --- a/Documentation/BreakingChanges.adoc > > +++ b/Documentation/BreakingChanges.adoc > > @@ -118,6 +118,45 @@ Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@zombino.com>, > > <20170223155046.e7nxivfwqqoprsqj@LykOS.localdomain>, > > <CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47EQDE+DiUQ@mail.gmail.com>. > > > > +* The default storage format for references in newly created repositories will > > + be changed from "files" to "reftable". The "reftable" format provides > > + multiple advantages over the "files" format: > > ++ > > + ** It is impossible to store two references that only differ in casing on > > ... > > + ** Writing many references at once is slow with the "files" backend because > > + every reference is created as a separate file. The "reftable" backend > > + significantly outperforms the "files" backend by multiple orders of > > + magnitude. > > These list benefits of using "reftable". Can we also add one point > that stresses why we want to make it the default? Something like > "Having to do X once per user to make them opt-in is too cumbersome" > is probably good enough. Maybe an additional line about "most people pick the default option and, given the information above, we think that users will have a better experience with reftable as the default" (especially, in my view, users on case-insensitive file systems). > > +A prerequisite for this change is that the ecosystem is ready to support the > > +"reftable" format. Most importantly, alternative implementations of Git like > > +JGit, libgit2 and Gitoxide need to support it. > > ... in order for them to access the same repository. > > How common is it to use a single repository from these multiple > implementations these days, I have to wonder? Pretty common. I know Rust's Cargo package manager uses libgit2 and I'm sure there are other development tools that do so. At a previous employer, we had a linting tool that used libgit2 and we used command-line Git for normal operations. I don't work with Java on a regular basis, but I expect that similar kinds of things happen there, especially in Java-based IDEs. > > diff --git a/t/t0001-init.sh b/t/t0001-init.sh > > index f11a40811f2..e0f27484192 100755 > > --- a/t/t0001-init.sh > > +++ b/t/t0001-init.sh > > @@ -658,6 +658,22 @@ test_expect_success 'init warns about invalid init.defaultRefFormat' ' > > test_cmp expected actual > > ' > > > > +test_expect_success 'default ref format' ' > > + test_when_finished "rm -rf refformat" && > > + ( > > + sane_unset GIT_DEFAULT_REF_FORMAT && > > + git init refformat > > + ) && > > + if test_have_prereq WITH_BREAKING_CHANGES > > + then > > + echo reftable >expect > > + else > > + echo files >expect > > + fi && > > + git -C refformat rev-parse --show-ref-format >actual && > > + test_cmp expect actual > > +' I might just make a recommendation here for a `default-ref-format` key (or some similar name) to `git version --build-options` as well. That will get put in bug reports and troubleshooting output and will help people figure out what might be going wrong if there are any problems. -- brian m. carlson (they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] BreakingChanges: announce switch to "reftable" format 2025-07-02 21:21 ` brian m. carlson @ 2025-07-03 4:43 ` Patrick Steinhardt 0 siblings, 0 replies; 21+ messages in thread From: Patrick Steinhardt @ 2025-07-03 4:43 UTC (permalink / raw) To: brian m. carlson, Junio C Hamano, git, Karthik Nayak, K Jayatheerth, ryenus On Wed, Jul 02, 2025 at 09:21:25PM +0000, brian m. carlson wrote: > On 2025-07-02 at 17:03:25, Junio C Hamano wrote: > > Patrick Steinhardt <ps@pks.im> writes: > > > > > diff --git a/Documentation/BreakingChanges.adoc b/Documentation/BreakingChanges.adoc > > > index c6bd94986c5..c96b5319cdd 100644 > > > --- a/Documentation/BreakingChanges.adoc > > > +++ b/Documentation/BreakingChanges.adoc > > > @@ -118,6 +118,45 @@ Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@zombino.com>, > > > <20170223155046.e7nxivfwqqoprsqj@LykOS.localdomain>, > > > <CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47EQDE+DiUQ@mail.gmail.com>. > > > > > > +* The default storage format for references in newly created repositories will > > > + be changed from "files" to "reftable". The "reftable" format provides > > > + multiple advantages over the "files" format: > > > ++ > > > + ** It is impossible to store two references that only differ in casing on > > > ... > > > + ** Writing many references at once is slow with the "files" backend because > > > + every reference is created as a separate file. The "reftable" backend > > > + significantly outperforms the "files" backend by multiple orders of > > > + magnitude. > > > > These list benefits of using "reftable". Can we also add one point > > that stresses why we want to make it the default? Something like > > "Having to do X once per user to make them opt-in is too cumbersome" > > is probably good enough. > > Maybe an additional line about "most people pick the default option and, > given the information above, we think that users will have a better > experience with reftable as the default" (especially, in my view, users > on case-insensitive file systems). Yup, makes sense. This is what I've queued: Users that get immediate benefit from the "reftable" backend could continue to opt-in to the "reftable" format manually by setting the "init.defaultRefFormat" config. But defaults matter, and we think that overall users will have a better experience with less platform-specific quirks with the new backend. > > > +A prerequisite for this change is that the ecosystem is ready to support the > > > +"reftable" format. Most importantly, alternative implementations of Git like > > > +JGit, libgit2 and Gitoxide need to support it. > > > > ... in order for them to access the same repository. > > > > How common is it to use a single repository from these multiple > > implementations these days, I have to wonder? > > Pretty common. I know Rust's Cargo package manager uses libgit2 and I'm > sure there are other development tools that do so. At a previous > employer, we had a linting tool that used libgit2 and we used > command-line Git for normal operations. I don't work with Java on a > regular basis, but I expect that similar kinds of things happen there, > especially in Java-based IDEs. Yeah, I have hit issues with Cargo myself. JGit users should be mostly fine as it already supports reftables, but IIRC it only supported v0 of the format where there is no explicit hash function yet. I'll try to engage with the respective communities and figure out a way forward to get reftable support landed. For libgit2 I might be able to have my team do it. For Gitoxide I plan to have a chat with Byron to figure something out. The missing support for explicit hash functions in JGit I've already mentioned to folks. > > > diff --git a/t/t0001-init.sh b/t/t0001-init.sh > > > index f11a40811f2..e0f27484192 100755 > > > --- a/t/t0001-init.sh > > > +++ b/t/t0001-init.sh > > > @@ -658,6 +658,22 @@ test_expect_success 'init warns about invalid init.defaultRefFormat' ' > > > test_cmp expected actual > > > ' > > > > > > +test_expect_success 'default ref format' ' > > > + test_when_finished "rm -rf refformat" && > > > + ( > > > + sane_unset GIT_DEFAULT_REF_FORMAT && > > > + git init refformat > > > + ) && > > > + if test_have_prereq WITH_BREAKING_CHANGES > > > + then > > > + echo reftable >expect > > > + else > > > + echo files >expect > > > + fi && > > > + git -C refformat rev-parse --show-ref-format >actual && > > > + test_cmp expect actual > > > +' > > I might just make a recommendation here for a `default-ref-format` key > (or some similar name) to `git version --build-options` as well. That > will get put in bug reports and troubleshooting output and will help > people figure out what might be going wrong if there are any problems. D'oh, obviously, given that I have recommended the same on your series :P Patrick ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] BreakingChanges: announce switch to "reftable" format 2025-07-02 17:03 ` Junio C Hamano 2025-07-02 21:21 ` brian m. carlson @ 2025-07-03 4:43 ` Patrick Steinhardt 1 sibling, 0 replies; 21+ messages in thread From: Patrick Steinhardt @ 2025-07-03 4:43 UTC (permalink / raw) To: Junio C Hamano Cc: git, brian m. carlson, Karthik Nayak, K Jayatheerth, ryenus On Wed, Jul 02, 2025 at 10:03:25AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > diff --git a/setup.c b/setup.c > > index f93bd6a24a5..3ab0f11fbfd 100644 > > --- a/setup.c > > +++ b/setup.c > > @@ -2541,6 +2541,12 @@ static void repository_format_configure(struct repository_format *repo_fmt, > > repo_fmt->ref_storage_format = ref_format; > > } else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) { > > repo_fmt->ref_storage_format = cfg.ref_format; > > + } else { > > +#ifdef WITH_BREAKING_CHANGES > > + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_REFTABLE; > > +#else > > + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_FILES; > > +#endif > > } > > repo_set_ref_storage_format(the_repository, repo_fmt->ref_storage_format); > > } > > That's obvious one. I think the approach taken by brian's SHA-256 > topic would have introduced REF_STORAGE_FORMAT_DEFAULT and did the > build-time switching between the two in a single conditional > definition > > #ifndef WITH_BREAKING_CHANGES /* 3.0 */ > # define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_FILES > #else > # define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_REFTABLE > #endif > > somewhere in a header file. Either way would work, but I wonder if > these breaking-changes definitions are collected together into a > single header file (say <bc.h>), it may make the transition at 3.0 > version boundary simpler and less error-prone. We can just discard > selected conditionals into unconditional definition more easily. > For example if we moved the default flip between SHA-1 and SHA-256, > i.e. > > #ifndef WITH_BREAKING_CHANGES /* 3.0 */ > # define GIT_HASH_DEFAULT GIT_HASH_SHA1 > #else > # define GIT_HASH_DEFAULT GIT_HASH_SHA256 > #endif > > out of hash.h and have it next to the above REF_STORAGE_FORMAT_DEFAULT > definition, and then in a subsystem specific header file, after > including <bc.h>, can say > > === In hash.h === > #include <bc.h> > #ifndef GIT_HASH_DEFAULT > # define GIT_HASH_DEFAULT GIT_HASH_SHA256 > #endif > > === In refs.h === > #include <bc.h> > #ifndef REF_STORAGE_FORMAT_DEFAULT > # define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_REFTABLE > #endif > > If some reason making reftable backend the default when unspecified > turns out to be a bit premature at 3.0 boundary while the world is > ready for SHA-256 by default for new repositories, then we can tweak > that single header file like so: > > -#ifndef WITH_BREAKING_CHANGES /* 3.0 */ > +#ifndef WITH_BREAKING_CHANGES /* 4.0? */ > # define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_FILES > #else > # define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_REFTABLE > #endif > > -#ifndef WITH_BREAKING_CHANGES > -# define GIT_HASH_DEFAULT GIT_HASH_SHA1 > -#else > -# define GIT_HASH_DEFAULT GIT_HASH_SHA256 > -#endif > > and optionally change the "if default is not set, use 256" in <hash.h> > to "unconditionally use 256 as the default", but forgetting to do so > would not break anything, which makes the process less error prone. > > By doing something like this, we'll have a single place <bc.h> to > see what are being planned, and we can "git log that-header-file" to > see how our thinking has evolved over time. Hopefully w ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] BreakingChanges: announce switch to "reftable" format 2025-07-02 10:14 ` [PATCH 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt 2025-07-02 17:03 ` Junio C Hamano @ 2025-07-02 17:17 ` Justin Tobler 2025-07-03 5:00 ` Patrick Steinhardt 1 sibling, 1 reply; 21+ messages in thread From: Justin Tobler @ 2025-07-02 17:17 UTC (permalink / raw) To: Patrick Steinhardt Cc: git, brian m. carlson, Karthik Nayak, K Jayatheerth, ryenus, Junio C Hamano On 25/07/02 12:14PM, Patrick Steinhardt wrote: > The "reftable" format has come a long way and has matured nicely since > it has been merged into git via 57db2a094d5 (refs: introduce reftable > backend, 2024-02-07). It fixes longstanding issues that cannot be fixed > with the "files" format in a backwards-compatible way and performs > significantly better in many use cases. > > Announce that we will switch to the "reftable" format in Git 3.0 for > newly created repositories. > > This switch is dependent on support in the larger Git ecosystem. Most > importantly, libraries like JGit, libgit2 and Gitoxide should support > the reftable backend so that we don't break all applications and tools > built on top of those libraries. > > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > Documentation/BreakingChanges.adoc | 39 ++++++++++++++++++++++++++++++++++++++ > setup.c | 6 ++++++ > t/t0001-init.sh | 16 ++++++++++++++++ > 3 files changed, 61 insertions(+) > > diff --git a/Documentation/BreakingChanges.adoc b/Documentation/BreakingChanges.adoc > index c6bd94986c5..c96b5319cdd 100644 > --- a/Documentation/BreakingChanges.adoc > +++ b/Documentation/BreakingChanges.adoc > @@ -118,6 +118,45 @@ Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@zombino.com>, > <20170223155046.e7nxivfwqqoprsqj@LykOS.localdomain>, > <CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47EQDE+DiUQ@mail.gmail.com>. > > +* The default storage format for references in newly created repositories will > + be changed from "files" to "reftable". The "reftable" format provides > + multiple advantages over the "files" format: > ++ > + ** It is impossible to store two references that only differ in casing on > + case-insensitive filesystems with the "files" format. This issue is > + especially common on Windows, but also on older versions of macOS. As the > + "reftable" backend does not use filesystem paths anymore to encode > + reference names this problem goes away. I believe even modern macOS by default uses a case-insensitive file-system. Maybe we should instead say: This limitation is common on Windows and macOS platforms. > + ** Similarly, macOS normalizes path names that contain unicode characters, > + which has the consequence that you cannot store two names with unicode > + characters that are encoded differently with the "files" backend. Again, > + this is not an issue with the "reftable" backend. > + ** Deleting references with the "files" backend requires Git to rewrite the > + complete "packed-refs" file. In large repositories with many references > + this file can easily be dozens of megabytes in size, in extreme cases it > + may be gigabytes. The "reftable" backend uses tombstone markers for > + deleted references and thus does not have to rewrite all of its data. > + ** Repository housekeeping with the "files" backend typically performs > + all-into-one repacks of references. This can be quite expensive, and > + consequently housekeeping is a tradeoff between the number of loose > + references that accumulate and slow down operations that read references, > + and compressing those loose references into the "packed-refs" file. The > + "reftable" backend uses geometric compaction after every write, which > + amortizes costs and ensures that the backend is always in a > + well-maintained state. > + ** Operations that write multiple references at once are not atomic with the > + "files" backend. Consequently, Git may see in-between states when it reads > + references while a reference transaction is in the process of being > + committed to disk. > + ** Writing many references at once is slow with the "files" backend because > + every reference is created as a separate file. The "reftable" backend > + significantly outperforms the "files" backend by multiple orders of > + magnitude. The examples above do a good job at explaining individual technical benefits. I do wonder if we should include a more general statement aimed at users as to why the change to reftables is beneficial. Maybe something like: The reftables backend addresses several performance concerns as the number of references scale in a repository. > ++ > +A prerequisite for this change is that the ecosystem is ready to support the > +"reftable" format. Most importantly, alternative implementations of Git like > +JGit, libgit2 and Gitoxide need to support it. > + > === Removals > > * Support for grafting commits has long been superseded by git-replace(1). > diff --git a/setup.c b/setup.c > index f93bd6a24a5..3ab0f11fbfd 100644 > --- a/setup.c > +++ b/setup.c > @@ -2541,6 +2541,12 @@ static void repository_format_configure(struct repository_format *repo_fmt, > repo_fmt->ref_storage_format = ref_format; > } else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) { > repo_fmt->ref_storage_format = cfg.ref_format; > + } else { > +#ifdef WITH_BREAKING_CHANGES > + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_REFTABLE; > +#else > + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_FILES; > +#endif Ok so now when we build with `WITH_BREAKING_CHANGES` the default reference format is changed to reftables. > } > repo_set_ref_storage_format(the_repository, repo_fmt->ref_storage_format); > } > diff --git a/t/t0001-init.sh b/t/t0001-init.sh > index f11a40811f2..e0f27484192 100755 > --- a/t/t0001-init.sh > +++ b/t/t0001-init.sh > @@ -658,6 +658,22 @@ test_expect_success 'init warns about invalid init.defaultRefFormat' ' > test_cmp expected actual > ' > > +test_expect_success 'default ref format' ' > + test_when_finished "rm -rf refformat" && > + ( > + sane_unset GIT_DEFAULT_REF_FORMAT && > + git init refformat > + ) && > + if test_have_prereq WITH_BREAKING_CHANGES > + then > + echo reftable >expect > + else > + echo files >expect > + fi && > + git -C refformat rev-parse --show-ref-format >actual && > + test_cmp expect actual > +' And here add a test to verify this change. Looks good :) -Justin ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/2] BreakingChanges: announce switch to "reftable" format 2025-07-02 17:17 ` Justin Tobler @ 2025-07-03 5:00 ` Patrick Steinhardt 0 siblings, 0 replies; 21+ messages in thread From: Patrick Steinhardt @ 2025-07-03 5:00 UTC (permalink / raw) To: Justin Tobler Cc: git, brian m. carlson, Karthik Nayak, K Jayatheerth, ryenus, Junio C Hamano On Wed, Jul 02, 2025 at 12:17:50PM -0500, Justin Tobler wrote: > On 25/07/02 12:14PM, Patrick Steinhardt wrote: > > diff --git a/Documentation/BreakingChanges.adoc b/Documentation/BreakingChanges.adoc > > index c6bd94986c5..c96b5319cdd 100644 > > --- a/Documentation/BreakingChanges.adoc > > +++ b/Documentation/BreakingChanges.adoc > > @@ -118,6 +118,45 @@ Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@zombino.com>, > > <20170223155046.e7nxivfwqqoprsqj@LykOS.localdomain>, > > <CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47EQDE+DiUQ@mail.gmail.com>. > > > > +* The default storage format for references in newly created repositories will > > + be changed from "files" to "reftable". The "reftable" format provides > > + multiple advantages over the "files" format: > > ++ > > + ** It is impossible to store two references that only differ in casing on > > + case-insensitive filesystems with the "files" format. This issue is > > + especially common on Windows, but also on older versions of macOS. As the > > + "reftable" backend does not use filesystem paths anymore to encode > > + reference names this problem goes away. > > I believe even modern macOS by default uses a case-insensitive > file-system. Maybe we should instead say: > > This limitation is common on Windows and macOS platforms. Okay, thanks for the clarification. I thought recent versions of macOS were case-sensitive by default. > > + ** Similarly, macOS normalizes path names that contain unicode characters, > > + which has the consequence that you cannot store two names with unicode > > + characters that are encoded differently with the "files" backend. Again, > > + this is not an issue with the "reftable" backend. > > + ** Deleting references with the "files" backend requires Git to rewrite the > > + complete "packed-refs" file. In large repositories with many references > > + this file can easily be dozens of megabytes in size, in extreme cases it > > + may be gigabytes. The "reftable" backend uses tombstone markers for > > + deleted references and thus does not have to rewrite all of its data. > > + ** Repository housekeeping with the "files" backend typically performs > > + all-into-one repacks of references. This can be quite expensive, and > > + consequently housekeeping is a tradeoff between the number of loose > > + references that accumulate and slow down operations that read references, > > + and compressing those loose references into the "packed-refs" file. The > > + "reftable" backend uses geometric compaction after every write, which > > + amortizes costs and ensures that the backend is always in a > > + well-maintained state. > > + ** Operations that write multiple references at once are not atomic with the > > + "files" backend. Consequently, Git may see in-between states when it reads > > + references while a reference transaction is in the process of being > > + committed to disk. > > + ** Writing many references at once is slow with the "files" backend because > > + every reference is created as a separate file. The "reftable" backend > > + significantly outperforms the "files" backend by multiple orders of > > + magnitude. > > The examples above do a good job at explaining individual technical > benefits. I do wonder if we should include a more general statement > aimed at users as to why the change to reftables is beneficial. Maybe > something like: > > The reftables backend addresses several performance concerns as the > number of references scale in a repository. I think this would be a bit too handwavy. I'd rather want to point out the specific cases where we know it to perform better. Patrick ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/2] setup: use "reftable" format when experimental features are enabled 2025-07-02 10:14 [PATCH 0/2] Add reftable by default as a breaking change Patrick Steinhardt 2025-07-02 10:14 ` [PATCH 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt @ 2025-07-02 10:14 ` Patrick Steinhardt 2025-07-03 6:15 ` [PATCH v2 0/2] Add reftable by default as a breaking change Patrick Steinhardt 2025-07-04 9:42 ` [PATCH v3 " Patrick Steinhardt 3 siblings, 0 replies; 21+ messages in thread From: Patrick Steinhardt @ 2025-07-02 10:14 UTC (permalink / raw) To: git; +Cc: brian m. carlson, Karthik Nayak, K Jayatheerth, ryenus, Junio C Hamano With the preceding commit we have announced the switch to the "reftable" format in Git 3.0 for newly created repositories. The format is being battle tested by GitLab and a couple of other developers, and except for a small handful of issues exposed early after it has been merged it has been rock solid. Regardless of that though the test user base is still comparatively small, which increases the risk that we miss critical bugs. Address this by enabling the reftable format when experimental features are enabled. This should increase the test user base by some margin and thus give us more input before making the format the default. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- Documentation/config/feature.adoc | 6 ++++++ setup.c | 12 ++++++++++++ t/t0001-init.sh | 34 ++++++++++++++++++++++++++++++++++ 3 files changed, 52 insertions(+) diff --git a/Documentation/config/feature.adoc b/Documentation/config/feature.adoc index cb49ff2604a..924f5ff4e3c 100644 --- a/Documentation/config/feature.adoc +++ b/Documentation/config/feature.adoc @@ -24,6 +24,12 @@ reusing objects from multiple packs instead of just one. * `pack.usePathWalk` may speed up packfile creation and make the packfiles be significantly smaller in the presence of certain filename collisions with Git's default name-hash. ++ +* `init.defaultRefFormat=reftable` causes newly initialized repositories to use +the reftable format for storing references. This new format solves issues with +case-insensitive filesystems, compresses better and performs significantly +better with many use cases. Refer to Documentation/technical/reftable.adoc for +more information on this new storage format. feature.manyFiles:: Enable config options that optimize for repos with many files in the diff --git a/setup.c b/setup.c index 3ab0f11fbfd..8e9c0ffa1fe 100644 --- a/setup.c +++ b/setup.c @@ -2481,6 +2481,18 @@ static int read_default_format_config(const char *key, const char *value, goto out; } + /* + * Enable the reftable format when "features.experimental" is enabled. + * "init.defaultRefFormat" takes precedence over this setting. + */ + if (!strcmp(key, "feature.experimental") && + cfg->ref_format == REF_STORAGE_FORMAT_UNKNOWN && + git_config_bool(key, value)) { + cfg->ref_format = REF_STORAGE_FORMAT_REFTABLE; + ret = 0; + goto out; + } + ret = 0; out: free(str); diff --git a/t/t0001-init.sh b/t/t0001-init.sh index e0f27484192..df14d88ebb4 100755 --- a/t/t0001-init.sh +++ b/t/t0001-init.sh @@ -754,6 +754,40 @@ test_expect_success "GIT_DEFAULT_REF_FORMAT= overrides init.defaultRefFormat" ' test_cmp expect actual ' +test_expect_success "init with feature.experimental=true" ' + test_when_finished "rm -rf refformat" && + test_config_global feature.experimental true && + ( + sane_unset GIT_DEFAULT_REF_FORMAT && + git init refformat + ) && + echo reftable >expect && + git -C refformat rev-parse --show-ref-format >actual && + test_cmp expect actual +' + +test_expect_success "init.defaultRefFormat overrides feature.experimental=true" ' + test_when_finished "rm -rf refformat" && + test_config_global feature.experimental true && + test_config_global init.defaultRefFormat files && + ( + sane_unset GIT_DEFAULT_REF_FORMAT && + git init refformat + ) && + echo files >expect && + git -C refformat rev-parse --show-ref-format >actual && + test_cmp expect actual +' + +test_expect_success "GIT_DEFAULT_REF_FORMAT= overrides feature.experimental=true" ' + test_when_finished "rm -rf refformat" && + test_config_global feature.experimental true && + GIT_DEFAULT_REF_FORMAT=files git init refformat && + echo files >expect && + git -C refformat rev-parse --show-ref-format >actual && + test_cmp expect actual +' + for from_format in $backends do test_expect_success "re-init with same format ($from_format)" ' -- 2.50.0.195.g74e6fc65d0.dirty ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 0/2] Add reftable by default as a breaking change 2025-07-02 10:14 [PATCH 0/2] Add reftable by default as a breaking change Patrick Steinhardt 2025-07-02 10:14 ` [PATCH 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt 2025-07-02 10:14 ` [PATCH 2/2] setup: use "reftable" format when experimental features are enabled Patrick Steinhardt @ 2025-07-03 6:15 ` Patrick Steinhardt 2025-07-03 6:15 ` [PATCH v2 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt ` (2 more replies) 2025-07-04 9:42 ` [PATCH v3 " Patrick Steinhardt 3 siblings, 3 replies; 21+ messages in thread From: Patrick Steinhardt @ 2025-07-03 6:15 UTC (permalink / raw) To: git Cc: brian m. carlson, Karthik Nayak, K Jayatheerth, ryenus, Junio C Hamano, Justin Tobler Hi, the recent thread at [1] motivated me to hack together this tiny patch series that paves our path towards making the reftable backend the default backend. It does two things: - It announces the breaking change for Git 3.0. - It makes it the default now already when "feature.experimental" is enabled. The first item is subject to ecosystem support, most notably in libraries like Gitoxide, libgit2 and JGit. The second item is intended to extend the user base to power users so that we get more test exposure out in the wild before we make it the default in Git 3.0. Changes in v2: - Improve the breaking changes announcement a bit based on feedback. - Introduce a `REF_STORAGE_FORMAT_DEFAULT` define. - Print the default ref format as part of `git version --build-options`. - Link to v1: https://lore.kernel.org/r/20250702-pks-reftable-default-backend-v1-0-84dbaddafb50@pks.im Thanks! Patrick [1]: <xmqqtt3vkhwk.fsf@gitster.g> --- Patrick Steinhardt (2): BreakingChanges: announce switch to "reftable" format setup: use "reftable" format when experimental features are enabled Documentation/BreakingChanges.adoc | 44 +++++++++++++++++++++++++++++++++++++ Documentation/config/feature.adoc | 6 +++++ help.c | 2 ++ repository.h | 6 +++++ setup.c | 14 ++++++++++++ t/t0001-init.sh | 45 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 117 insertions(+) Range-diff versus v1: 1: f12545f39d3 ! 1: 0b4cf2c7a25 BreakingChanges: announce switch to "reftable" format @@ Documentation/BreakingChanges.adoc: Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@zo + multiple advantages over the "files" format: ++ + ** It is impossible to store two references that only differ in casing on -+ case-insensitive filesystems with the "files" format. This issue is -+ especially common on Windows, but also on older versions of macOS. As the -+ "reftable" backend does not use filesystem paths anymore to encode -+ reference names this problem goes away. ++ case-insensitive filesystems with the "files" format. This issue is common ++ on Windows and macOS platforms. As the "reftable" backend does not use ++ filesystem paths anymore to encode reference names this problem goes away. + ** Similarly, macOS normalizes path names that contain unicode characters, + which has the consequence that you cannot store two names with unicode + characters that are encoded differently with the "files" backend. Again, @@ Documentation/BreakingChanges.adoc: Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@zo + significantly outperforms the "files" backend by multiple orders of + magnitude. ++ ++Users that get immediate benefit from the "reftable" backend could continue to ++opt-in to the "reftable" format manually by setting the "init.defaultRefFormat" ++config. But defaults matter, and we think that overall users will have a better ++experience with less platform-specific quirks when they use the new backend by ++default. +++ +A prerequisite for this change is that the ecosystem is ready to support the +"reftable" format. Most importantly, alternative implementations of Git like +JGit, libgit2 and Gitoxide need to support it. @@ Documentation/BreakingChanges.adoc: Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@zo * Support for grafting commits has long been superseded by git-replace(1). + ## help.c ## +@@ help.c: void get_version_info(struct strbuf *buf, int show_build_options) + SHA1_UNSAFE_BACKEND); + #endif + strbuf_addf(buf, "SHA-256: %s\n", SHA256_BACKEND); ++ strbuf_addf(buf, "default-ref-format: %s\n", ++ ref_storage_format_to_name(REF_STORAGE_FORMAT_DEFAULT)); + } + } + + + ## repository.h ## +@@ repository.h: enum ref_storage_format { + REF_STORAGE_FORMAT_REFTABLE, + }; + ++#ifdef WITH_BREAKING_CHANGES /* Git 3.0 */ ++# define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_REFTABLE ++#else ++# define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_FILES ++#endif ++ + struct repo_path_cache { + char *squash_msg; + char *merge_msg; + ## setup.c ## @@ setup.c: static void repository_format_configure(struct repository_format *repo_fmt, repo_fmt->ref_storage_format = ref_format; } else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) { repo_fmt->ref_storage_format = cfg.ref_format; + } else { -+#ifdef WITH_BREAKING_CHANGES -+ repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_REFTABLE; -+#else -+ repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_FILES; -+#endif ++ repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_DEFAULT; } repo_set_ref_storage_format(the_repository, repo_fmt->ref_storage_format); } @@ t/t0001-init.sh: test_expect_success 'init warns about invalid init.defaultRefFo + sane_unset GIT_DEFAULT_REF_FORMAT && + git init refformat + ) && -+ if test_have_prereq WITH_BREAKING_CHANGES -+ then -+ echo reftable >expect -+ else -+ echo files >expect -+ fi && ++ git version --build-options | sed -ne "s/^default-ref-format: //p" >expect && + git -C refformat rev-parse --show-ref-format >actual && + test_cmp expect actual +' 2: 1fff73157a9 = 2: 3fddba1a29a setup: use "reftable" format when experimental features are enabled --- base-commit: 83014dc05f6fc9275c0a02886cb428805abaf9e5 change-id: 20250702-pks-reftable-default-backend-6c30f330250a ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 1/2] BreakingChanges: announce switch to "reftable" format 2025-07-03 6:15 ` [PATCH v2 0/2] Add reftable by default as a breaking change Patrick Steinhardt @ 2025-07-03 6:15 ` Patrick Steinhardt 2025-07-03 10:54 ` Karthik Nayak 2025-07-03 6:15 ` [PATCH v2 2/2] setup: use "reftable" format when experimental features are enabled Patrick Steinhardt 2025-07-07 5:37 ` [PATCH v2 0/2] Add reftable by default as a breaking change Junio C Hamano 2 siblings, 1 reply; 21+ messages in thread From: Patrick Steinhardt @ 2025-07-03 6:15 UTC (permalink / raw) To: git Cc: brian m. carlson, Karthik Nayak, K Jayatheerth, ryenus, Junio C Hamano, Justin Tobler The "reftable" format has come a long way and has matured nicely since it has been merged into git via 57db2a094d5 (refs: introduce reftable backend, 2024-02-07). It fixes longstanding issues that cannot be fixed with the "files" format in a backwards-compatible way and performs significantly better in many use cases. Announce that we will switch to the "reftable" format in Git 3.0 for newly created repositories. This switch is dependent on support in the larger Git ecosystem. Most importantly, libraries like JGit, libgit2 and Gitoxide should support the reftable backend so that we don't break all applications and tools built on top of those libraries. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- Documentation/BreakingChanges.adoc | 44 ++++++++++++++++++++++++++++++++++++++ help.c | 2 ++ repository.h | 6 ++++++ setup.c | 2 ++ t/t0001-init.sh | 11 ++++++++++ 5 files changed, 65 insertions(+) diff --git a/Documentation/BreakingChanges.adoc b/Documentation/BreakingChanges.adoc index c6bd94986c5..614debcd740 100644 --- a/Documentation/BreakingChanges.adoc +++ b/Documentation/BreakingChanges.adoc @@ -118,6 +118,50 @@ Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@zombino.com>, <20170223155046.e7nxivfwqqoprsqj@LykOS.localdomain>, <CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47EQDE+DiUQ@mail.gmail.com>. +* The default storage format for references in newly created repositories will + be changed from "files" to "reftable". The "reftable" format provides + multiple advantages over the "files" format: ++ + ** It is impossible to store two references that only differ in casing on + case-insensitive filesystems with the "files" format. This issue is common + on Windows and macOS platforms. As the "reftable" backend does not use + filesystem paths anymore to encode reference names this problem goes away. + ** Similarly, macOS normalizes path names that contain unicode characters, + which has the consequence that you cannot store two names with unicode + characters that are encoded differently with the "files" backend. Again, + this is not an issue with the "reftable" backend. + ** Deleting references with the "files" backend requires Git to rewrite the + complete "packed-refs" file. In large repositories with many references + this file can easily be dozens of megabytes in size, in extreme cases it + may be gigabytes. The "reftable" backend uses tombstone markers for + deleted references and thus does not have to rewrite all of its data. + ** Repository housekeeping with the "files" backend typically performs + all-into-one repacks of references. This can be quite expensive, and + consequently housekeeping is a tradeoff between the number of loose + references that accumulate and slow down operations that read references, + and compressing those loose references into the "packed-refs" file. The + "reftable" backend uses geometric compaction after every write, which + amortizes costs and ensures that the backend is always in a + well-maintained state. + ** Operations that write multiple references at once are not atomic with the + "files" backend. Consequently, Git may see in-between states when it reads + references while a reference transaction is in the process of being + committed to disk. + ** Writing many references at once is slow with the "files" backend because + every reference is created as a separate file. The "reftable" backend + significantly outperforms the "files" backend by multiple orders of + magnitude. ++ +Users that get immediate benefit from the "reftable" backend could continue to +opt-in to the "reftable" format manually by setting the "init.defaultRefFormat" +config. But defaults matter, and we think that overall users will have a better +experience with less platform-specific quirks when they use the new backend by +default. ++ +A prerequisite for this change is that the ecosystem is ready to support the +"reftable" format. Most importantly, alternative implementations of Git like +JGit, libgit2 and Gitoxide need to support it. + === Removals * Support for grafting commits has long been superseded by git-replace(1). diff --git a/help.c b/help.c index 21b778707a6..89cd47e3b86 100644 --- a/help.c +++ b/help.c @@ -810,6 +810,8 @@ void get_version_info(struct strbuf *buf, int show_build_options) SHA1_UNSAFE_BACKEND); #endif strbuf_addf(buf, "SHA-256: %s\n", SHA256_BACKEND); + strbuf_addf(buf, "default-ref-format: %s\n", + ref_storage_format_to_name(REF_STORAGE_FORMAT_DEFAULT)); } } diff --git a/repository.h b/repository.h index c4c92b2ab9c..77c4189d5dc 100644 --- a/repository.h +++ b/repository.h @@ -20,6 +20,12 @@ enum ref_storage_format { REF_STORAGE_FORMAT_REFTABLE, }; +#ifdef WITH_BREAKING_CHANGES /* Git 3.0 */ +# define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_REFTABLE +#else +# define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_FILES +#endif + struct repo_path_cache { char *squash_msg; char *merge_msg; diff --git a/setup.c b/setup.c index f93bd6a24a5..f0c06c655a9 100644 --- a/setup.c +++ b/setup.c @@ -2541,6 +2541,8 @@ static void repository_format_configure(struct repository_format *repo_fmt, repo_fmt->ref_storage_format = ref_format; } else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) { repo_fmt->ref_storage_format = cfg.ref_format; + } else { + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_DEFAULT; } repo_set_ref_storage_format(the_repository, repo_fmt->ref_storage_format); } diff --git a/t/t0001-init.sh b/t/t0001-init.sh index f11a40811f2..186664162fc 100755 --- a/t/t0001-init.sh +++ b/t/t0001-init.sh @@ -658,6 +658,17 @@ test_expect_success 'init warns about invalid init.defaultRefFormat' ' test_cmp expected actual ' +test_expect_success 'default ref format' ' + test_when_finished "rm -rf refformat" && + ( + sane_unset GIT_DEFAULT_REF_FORMAT && + git init refformat + ) && + git version --build-options | sed -ne "s/^default-ref-format: //p" >expect && + git -C refformat rev-parse --show-ref-format >actual && + test_cmp expect actual +' + backends="files reftable" for format in $backends do -- 2.50.0.195.g74e6fc65d0.dirty ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 1/2] BreakingChanges: announce switch to "reftable" format 2025-07-03 6:15 ` [PATCH v2 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt @ 2025-07-03 10:54 ` Karthik Nayak 2025-07-03 11:42 ` Patrick Steinhardt 0 siblings, 1 reply; 21+ messages in thread From: Karthik Nayak @ 2025-07-03 10:54 UTC (permalink / raw) To: Patrick Steinhardt, git Cc: brian m. carlson, K Jayatheerth, ryenus, Junio C Hamano, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 7510 bytes --] Patrick Steinhardt <ps@pks.im> writes: > The "reftable" format has come a long way and has matured nicely since > it has been merged into git via 57db2a094d5 (refs: introduce reftable > backend, 2024-02-07). It fixes longstanding issues that cannot be fixed > with the "files" format in a backwards-compatible way and performs > significantly better in many use cases. > > Announce that we will switch to the "reftable" format in Git 3.0 for > newly created repositories. > Nit: This commit does more than announce the switch. It also adds in the changes to use reftable when WITH_BREAKING_CHANGES is set. Would be nice to call that out here. > This switch is dependent on support in the larger Git ecosystem. Most > importantly, libraries like JGit, libgit2 and Gitoxide should support > the reftable backend so that we don't break all applications and tools > built on top of those libraries. > > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > Documentation/BreakingChanges.adoc | 44 ++++++++++++++++++++++++++++++++++++++ > help.c | 2 ++ > repository.h | 6 ++++++ > setup.c | 2 ++ > t/t0001-init.sh | 11 ++++++++++ > 5 files changed, 65 insertions(+) > > diff --git a/Documentation/BreakingChanges.adoc b/Documentation/BreakingChanges.adoc > index c6bd94986c5..614debcd740 100644 > --- a/Documentation/BreakingChanges.adoc > +++ b/Documentation/BreakingChanges.adoc > @@ -118,6 +118,50 @@ Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@zombino.com>, > <20170223155046.e7nxivfwqqoprsqj@LykOS.localdomain>, > <CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47EQDE+DiUQ@mail.gmail.com>. > > +* The default storage format for references in newly created repositories will > + be changed from "files" to "reftable". The "reftable" format provides > + multiple advantages over the "files" format: > ++ > + ** It is impossible to store two references that only differ in casing on > + case-insensitive filesystems with the "files" format. This issue is common > + on Windows and macOS platforms. As the "reftable" backend does not use > + filesystem paths anymore to encode reference names this problem goes away. Nit: s/anymore// makes it clearer, since reftable never used filesystem path. > + ** Similarly, macOS normalizes path names that contain unicode characters, > + which has the consequence that you cannot store two names with unicode > + characters that are encoded differently with the "files" backend. Again, > + this is not an issue with the "reftable" backend. > + ** Deleting references with the "files" backend requires Git to rewrite the > + complete "packed-refs" file. In large repositories with many references > + this file can easily be dozens of megabytes in size, in extreme cases it > + may be gigabytes. The "reftable" backend uses tombstone markers for > + deleted references and thus does not have to rewrite all of its data. > + ** Repository housekeeping with the "files" backend typically performs > + all-into-one repacks of references. This can be quite expensive, and > + consequently housekeeping is a tradeoff between the number of loose > + references that accumulate and slow down operations that read references, > + and compressing those loose references into the "packed-refs" file. The > + "reftable" backend uses geometric compaction after every write, which > + amortizes costs and ensures that the backend is always in a > + well-maintained state. > + ** Operations that write multiple references at once are not atomic with the > + "files" backend. Consequently, Git may see in-between states when it reads > + references while a reference transaction is in the process of being > + committed to disk. > + ** Writing many references at once is slow with the "files" backend because > + every reference is created as a separate file. The "reftable" backend > + significantly outperforms the "files" backend by multiple orders of > + magnitude. Perhaps something about how reftable uses a binary format and could save storage space. > ++ > +Users that get immediate benefit from the "reftable" backend could continue to > +opt-in to the "reftable" format manually by setting the "init.defaultRefFormat" > +config. But defaults matter, and we think that overall users will have a better > +experience with less platform-specific quirks when they use the new backend by > +default. > ++ > +A prerequisite for this change is that the ecosystem is ready to support the > +"reftable" format. Most importantly, alternative implementations of Git like > +JGit, libgit2 and Gitoxide need to support it. > + > === Removals > > * Support for grafting commits has long been superseded by git-replace(1). > diff --git a/help.c b/help.c > index 21b778707a6..89cd47e3b86 100644 > --- a/help.c > +++ b/help.c > @@ -810,6 +810,8 @@ void get_version_info(struct strbuf *buf, int show_build_options) > SHA1_UNSAFE_BACKEND); > #endif > strbuf_addf(buf, "SHA-256: %s\n", SHA256_BACKEND); > + strbuf_addf(buf, "default-ref-format: %s\n", > + ref_storage_format_to_name(REF_STORAGE_FORMAT_DEFAULT)); > } > } > > diff --git a/repository.h b/repository.h > index c4c92b2ab9c..77c4189d5dc 100644 > --- a/repository.h > +++ b/repository.h > @@ -20,6 +20,12 @@ enum ref_storage_format { > REF_STORAGE_FORMAT_REFTABLE, > }; > > +#ifdef WITH_BREAKING_CHANGES /* Git 3.0 */ > +# define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_REFTABLE > +#else > +# define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_FILES > +#endif > + Okay this makes sense. > struct repo_path_cache { > char *squash_msg; > char *merge_msg; > diff --git a/setup.c b/setup.c > index f93bd6a24a5..f0c06c655a9 100644 > --- a/setup.c > +++ b/setup.c > @@ -2541,6 +2541,8 @@ static void repository_format_configure(struct repository_format *repo_fmt, > repo_fmt->ref_storage_format = ref_format; > } else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) { > repo_fmt->ref_storage_format = cfg.ref_format; > + } else { > + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_DEFAULT; > } > repo_set_ref_storage_format(the_repository, repo_fmt->ref_storage_format); > } Shouldn't this change be instead made to REPOSITORY_FORMAT_INIT? diff --git a/setup.h b/setup.h index 18dc3b7368..c1b765043f 100644 --- a/setup.h +++ b/setup.h @@ -150,7 +150,7 @@ struct repository_format { .version = -1, \ .is_bare = -1, \ .hash_algo = GIT_HASH_SHA1, \ - .ref_storage_format = REF_STORAGE_FORMAT_FILES, \ + .ref_storage_format = REF_STORAGE_FORMAT_DEFAULT, \ .unknown_extensions = STRING_LIST_INIT_DUP, \ .v1_only_extensions = STRING_LIST_INIT_DUP, \ } > diff --git a/t/t0001-init.sh b/t/t0001-init.sh > index f11a40811f2..186664162fc 100755 > --- a/t/t0001-init.sh > +++ b/t/t0001-init.sh > @@ -658,6 +658,17 @@ test_expect_success 'init warns about invalid init.defaultRefFormat' ' > test_cmp expected actual > ' > > +test_expect_success 'default ref format' ' > + test_when_finished "rm -rf refformat" && > + ( > + sane_unset GIT_DEFAULT_REF_FORMAT && > + git init refformat > + ) && > + git version --build-options | sed -ne "s/^default-ref-format: //p" >expect && > + git -C refformat rev-parse --show-ref-format >actual && > + test_cmp expect actual > +' > + > backends="files reftable" > for format in $backends > do > > -- > 2.50.0.195.g74e6fc65d0.dirty [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 1/2] BreakingChanges: announce switch to "reftable" format 2025-07-03 10:54 ` Karthik Nayak @ 2025-07-03 11:42 ` Patrick Steinhardt 2025-07-03 12:24 ` Karthik Nayak 0 siblings, 1 reply; 21+ messages in thread From: Patrick Steinhardt @ 2025-07-03 11:42 UTC (permalink / raw) To: Karthik Nayak Cc: git, brian m. carlson, K Jayatheerth, ryenus, Junio C Hamano, Justin Tobler On Thu, Jul 03, 2025 at 12:54:24PM +0200, Karthik Nayak wrote: > Patrick Steinhardt <ps@pks.im> writes: > > diff --git a/setup.c b/setup.c > > index f93bd6a24a5..f0c06c655a9 100644 > > --- a/setup.c > > +++ b/setup.c > > @@ -2541,6 +2541,8 @@ static void repository_format_configure(struct repository_format *repo_fmt, > > repo_fmt->ref_storage_format = ref_format; > > } else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) { > > repo_fmt->ref_storage_format = cfg.ref_format; > > + } else { > > + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_DEFAULT; > > } > > repo_set_ref_storage_format(the_repository, repo_fmt->ref_storage_format); > > } > > Shouldn't this change be instead made to REPOSITORY_FORMAT_INIT? It made me a bit uneasy to change `REPOSITORY_FORMAT_INIT` as it is used in several places. So I opted for the more contained change. In any case, I found the logic to be hard to follow anyway as it is not immediately clear where the default value actually comes from without the `else` branch. So I consider it a good change regardless. In fact, I would argue we could go even further and change `REPOSITORY_FORMAT_INIT` to be set to `_UNKNOWN`. Same for the hash. Patrick ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 1/2] BreakingChanges: announce switch to "reftable" format 2025-07-03 11:42 ` Patrick Steinhardt @ 2025-07-03 12:24 ` Karthik Nayak 2025-07-03 13:08 ` Patrick Steinhardt 0 siblings, 1 reply; 21+ messages in thread From: Karthik Nayak @ 2025-07-03 12:24 UTC (permalink / raw) To: Patrick Steinhardt Cc: git, brian m. carlson, K Jayatheerth, ryenus, Junio C Hamano, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1677 bytes --] Patrick Steinhardt <ps@pks.im> writes: > On Thu, Jul 03, 2025 at 12:54:24PM +0200, Karthik Nayak wrote: >> Patrick Steinhardt <ps@pks.im> writes: >> > diff --git a/setup.c b/setup.c >> > index f93bd6a24a5..f0c06c655a9 100644 >> > --- a/setup.c >> > +++ b/setup.c >> > @@ -2541,6 +2541,8 @@ static void repository_format_configure(struct repository_format *repo_fmt, >> > repo_fmt->ref_storage_format = ref_format; >> > } else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) { >> > repo_fmt->ref_storage_format = cfg.ref_format; >> > + } else { >> > + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_DEFAULT; >> > } >> > repo_set_ref_storage_format(the_repository, repo_fmt->ref_storage_format); >> > } >> >> Shouldn't this change be instead made to REPOSITORY_FORMAT_INIT? > > It made me a bit uneasy to change `REPOSITORY_FORMAT_INIT` as it is used > in several places. So I opted for the more contained change. > > In any case, I found the logic to be hard to follow anyway as it is not > immediately clear where the default value actually comes from without > the `else` branch. So I consider it a good change regardless. In fact, I > would argue we could go even further and change `REPOSITORY_FORMAT_INIT` > to be set to `_UNKNOWN`. Same for the hash. > Exactly, I just read your patch and the existing code around it and was a bit confused because I couldn't pinpoint where we set the default to '_FILES' when there is no ENV or config setup. I think changing `REPOSITORY_FORMAT_INIT` to be set to `_UNKNOWN` makes a lot of sense combined with your change. I'll leave it to you if you want to include that in this series or not. > Patrick Thanks! [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 1/2] BreakingChanges: announce switch to "reftable" format 2025-07-03 12:24 ` Karthik Nayak @ 2025-07-03 13:08 ` Patrick Steinhardt 0 siblings, 0 replies; 21+ messages in thread From: Patrick Steinhardt @ 2025-07-03 13:08 UTC (permalink / raw) To: Karthik Nayak Cc: git, brian m. carlson, K Jayatheerth, ryenus, Junio C Hamano, Justin Tobler On Thu, Jul 03, 2025 at 08:24:01AM -0400, Karthik Nayak wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > On Thu, Jul 03, 2025 at 12:54:24PM +0200, Karthik Nayak wrote: > >> Patrick Steinhardt <ps@pks.im> writes: > >> > diff --git a/setup.c b/setup.c > >> > index f93bd6a24a5..f0c06c655a9 100644 > >> > --- a/setup.c > >> > +++ b/setup.c > >> > @@ -2541,6 +2541,8 @@ static void repository_format_configure(struct repository_format *repo_fmt, > >> > repo_fmt->ref_storage_format = ref_format; > >> > } else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) { > >> > repo_fmt->ref_storage_format = cfg.ref_format; > >> > + } else { > >> > + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_DEFAULT; > >> > } > >> > repo_set_ref_storage_format(the_repository, repo_fmt->ref_storage_format); > >> > } > >> > >> Shouldn't this change be instead made to REPOSITORY_FORMAT_INIT? > > > > It made me a bit uneasy to change `REPOSITORY_FORMAT_INIT` as it is used > > in several places. So I opted for the more contained change. > > > > In any case, I found the logic to be hard to follow anyway as it is not > > immediately clear where the default value actually comes from without > > the `else` branch. So I consider it a good change regardless. In fact, I > > would argue we could go even further and change `REPOSITORY_FORMAT_INIT` > > to be set to `_UNKNOWN`. Same for the hash. > > > > Exactly, I just read your patch and the existing code around it and was > a bit confused because I couldn't pinpoint where we set the default to > '_FILES' when there is no ENV or config setup. > > I think changing `REPOSITORY_FORMAT_INIT` to be set to `_UNKNOWN` makes > a lot of sense combined with your change. I'll leave it to you if you > want to include that in this series or not. I'd prefer to leave it out of this patch series. It's going to be a bit more involved than just switching out the values and adding the `else` branch for the hash, as well. The repository format code (or rather all of "setup.c") is a can of worms that I don't really want to open right now. Patrick ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 2/2] setup: use "reftable" format when experimental features are enabled 2025-07-03 6:15 ` [PATCH v2 0/2] Add reftable by default as a breaking change Patrick Steinhardt 2025-07-03 6:15 ` [PATCH v2 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt @ 2025-07-03 6:15 ` Patrick Steinhardt 2025-07-07 5:37 ` [PATCH v2 0/2] Add reftable by default as a breaking change Junio C Hamano 2 siblings, 0 replies; 21+ messages in thread From: Patrick Steinhardt @ 2025-07-03 6:15 UTC (permalink / raw) To: git Cc: brian m. carlson, Karthik Nayak, K Jayatheerth, ryenus, Junio C Hamano, Justin Tobler With the preceding commit we have announced the switch to the "reftable" format in Git 3.0 for newly created repositories. The format is being battle tested by GitLab and a couple of other developers, and except for a small handful of issues exposed early after it has been merged it has been rock solid. Regardless of that though the test user base is still comparatively small, which increases the risk that we miss critical bugs. Address this by enabling the reftable format when experimental features are enabled. This should increase the test user base by some margin and thus give us more input before making the format the default. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- Documentation/config/feature.adoc | 6 ++++++ setup.c | 12 ++++++++++++ t/t0001-init.sh | 34 ++++++++++++++++++++++++++++++++++ 3 files changed, 52 insertions(+) diff --git a/Documentation/config/feature.adoc b/Documentation/config/feature.adoc index cb49ff2604a..924f5ff4e3c 100644 --- a/Documentation/config/feature.adoc +++ b/Documentation/config/feature.adoc @@ -24,6 +24,12 @@ reusing objects from multiple packs instead of just one. * `pack.usePathWalk` may speed up packfile creation and make the packfiles be significantly smaller in the presence of certain filename collisions with Git's default name-hash. ++ +* `init.defaultRefFormat=reftable` causes newly initialized repositories to use +the reftable format for storing references. This new format solves issues with +case-insensitive filesystems, compresses better and performs significantly +better with many use cases. Refer to Documentation/technical/reftable.adoc for +more information on this new storage format. feature.manyFiles:: Enable config options that optimize for repos with many files in the diff --git a/setup.c b/setup.c index f0c06c655a9..97d7824d07a 100644 --- a/setup.c +++ b/setup.c @@ -2481,6 +2481,18 @@ static int read_default_format_config(const char *key, const char *value, goto out; } + /* + * Enable the reftable format when "features.experimental" is enabled. + * "init.defaultRefFormat" takes precedence over this setting. + */ + if (!strcmp(key, "feature.experimental") && + cfg->ref_format == REF_STORAGE_FORMAT_UNKNOWN && + git_config_bool(key, value)) { + cfg->ref_format = REF_STORAGE_FORMAT_REFTABLE; + ret = 0; + goto out; + } + ret = 0; out: free(str); diff --git a/t/t0001-init.sh b/t/t0001-init.sh index 186664162fc..f593c536874 100755 --- a/t/t0001-init.sh +++ b/t/t0001-init.sh @@ -749,6 +749,40 @@ test_expect_success "GIT_DEFAULT_REF_FORMAT= overrides init.defaultRefFormat" ' test_cmp expect actual ' +test_expect_success "init with feature.experimental=true" ' + test_when_finished "rm -rf refformat" && + test_config_global feature.experimental true && + ( + sane_unset GIT_DEFAULT_REF_FORMAT && + git init refformat + ) && + echo reftable >expect && + git -C refformat rev-parse --show-ref-format >actual && + test_cmp expect actual +' + +test_expect_success "init.defaultRefFormat overrides feature.experimental=true" ' + test_when_finished "rm -rf refformat" && + test_config_global feature.experimental true && + test_config_global init.defaultRefFormat files && + ( + sane_unset GIT_DEFAULT_REF_FORMAT && + git init refformat + ) && + echo files >expect && + git -C refformat rev-parse --show-ref-format >actual && + test_cmp expect actual +' + +test_expect_success "GIT_DEFAULT_REF_FORMAT= overrides feature.experimental=true" ' + test_when_finished "rm -rf refformat" && + test_config_global feature.experimental true && + GIT_DEFAULT_REF_FORMAT=files git init refformat && + echo files >expect && + git -C refformat rev-parse --show-ref-format >actual && + test_cmp expect actual +' + for from_format in $backends do test_expect_success "re-init with same format ($from_format)" ' -- 2.50.0.195.g74e6fc65d0.dirty ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/2] Add reftable by default as a breaking change 2025-07-03 6:15 ` [PATCH v2 0/2] Add reftable by default as a breaking change Patrick Steinhardt 2025-07-03 6:15 ` [PATCH v2 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt 2025-07-03 6:15 ` [PATCH v2 2/2] setup: use "reftable" format when experimental features are enabled Patrick Steinhardt @ 2025-07-07 5:37 ` Junio C Hamano 2 siblings, 0 replies; 21+ messages in thread From: Junio C Hamano @ 2025-07-07 5:37 UTC (permalink / raw) To: Patrick Steinhardt Cc: git, brian m. carlson, Karthik Nayak, K Jayatheerth, ryenus, Justin Tobler Patrick Steinhardt <ps@pks.im> writes: > Changes in v2: > - Improve the breaking changes announcement a bit based on feedback. > - Introduce a `REF_STORAGE_FORMAT_DEFAULT` define. > - Print the default ref format as part of `git version --build-options`. All changes relative to the previous round look excellent. Thanks. ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 0/2] Add reftable by default as a breaking change 2025-07-02 10:14 [PATCH 0/2] Add reftable by default as a breaking change Patrick Steinhardt ` (2 preceding siblings ...) 2025-07-03 6:15 ` [PATCH v2 0/2] Add reftable by default as a breaking change Patrick Steinhardt @ 2025-07-04 9:42 ` Patrick Steinhardt 2025-07-04 9:42 ` [PATCH v3 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt ` (2 more replies) 3 siblings, 3 replies; 21+ messages in thread From: Patrick Steinhardt @ 2025-07-04 9:42 UTC (permalink / raw) To: git Cc: brian m. carlson, Karthik Nayak, K Jayatheerth, ryenus, Junio C Hamano, Justin Tobler Hi, the recent thread at [1] motivated me to hack together this tiny patch series that paves our path towards making the reftable backend the default backend. It does two things: - It announces the breaking change for Git 3.0. - It makes it the default now already when "feature.experimental" is enabled. The first item is subject to ecosystem support, most notably in libraries like Gitoxide, libgit2 and JGit. The second item is intended to extend the user base to power users so that we get more test exposure out in the wild before we make it the default in Git 3.0. Changes in v2: - Improve the breaking changes announcement a bit based on feedback. - Introduce a `REF_STORAGE_FORMAT_DEFAULT` define. - Print the default ref format as part of `git version --build-options`. - Link to v1: https://lore.kernel.org/r/20250702-pks-reftable-default-backend-v1-0-84dbaddafb50@pks.im Changes in v3: - Small tweaks to the commit messages. - Mention better data compression as another benefit. - Link to v2: https://lore.kernel.org/r/20250703-pks-reftable-default-backend-v2-0-5a27e72a8c5e@pks.im Thanks! Patrick [1]: <xmqqtt3vkhwk.fsf@gitster.g> --- Patrick Steinhardt (2): BreakingChanges: announce switch to "reftable" format setup: use "reftable" format when experimental features are enabled Documentation/BreakingChanges.adoc | 47 ++++++++++++++++++++++++++++++++++++++ Documentation/config/feature.adoc | 6 +++++ help.c | 2 ++ repository.h | 6 +++++ setup.c | 14 ++++++++++++ t/t0001-init.sh | 45 ++++++++++++++++++++++++++++++++++++ 6 files changed, 120 insertions(+) Range-diff versus v2: 1: efbc0ba7338 ! 1: ecf018b81ff BreakingChanges: announce switch to "reftable" format @@ Commit message significantly better in many use cases. Announce that we will switch to the "reftable" format in Git 3.0 for - newly created repositories. + newly created repositories and wire up the change, hidden behind the + WITH_BREAKING_CHANGES preprocessor define. This switch is dependent on support in the larger Git ecosystem. Most importantly, libraries like JGit, libgit2 and Gitoxide should support @@ Documentation/BreakingChanges.adoc: Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@zo + ** It is impossible to store two references that only differ in casing on + case-insensitive filesystems with the "files" format. This issue is common + on Windows and macOS platforms. As the "reftable" backend does not use -+ filesystem paths anymore to encode reference names this problem goes away. ++ filesystem paths to encode reference names this problem goes away. + ** Similarly, macOS normalizes path names that contain unicode characters, + which has the consequence that you cannot store two names with unicode + characters that are encoded differently with the "files" backend. Again, @@ Documentation/BreakingChanges.adoc: Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@zo + every reference is created as a separate file. The "reftable" backend + significantly outperforms the "files" backend by multiple orders of + magnitude. ++ ** The reftable backend uses a binary format with prefix compression for ++ reference names. As a result, the format uses less space compared to the ++ "packed-refs" file. ++ +Users that get immediate benefit from the "reftable" backend could continue to +opt-in to the "reftable" format manually by setting the "init.defaultRefFormat" 2: 812cc75dfd8 = 2: 642f774d743 setup: use "reftable" format when experimental features are enabled --- base-commit: 83014dc05f6fc9275c0a02886cb428805abaf9e5 change-id: 20250702-pks-reftable-default-backend-6c30f330250a ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 1/2] BreakingChanges: announce switch to "reftable" format 2025-07-04 9:42 ` [PATCH v3 " Patrick Steinhardt @ 2025-07-04 9:42 ` Patrick Steinhardt 2025-07-04 9:42 ` [PATCH v3 2/2] setup: use "reftable" format when experimental features are enabled Patrick Steinhardt 2025-07-04 13:14 ` [PATCH v3 0/2] Add reftable by default as a breaking change Karthik Nayak 2 siblings, 0 replies; 21+ messages in thread From: Patrick Steinhardt @ 2025-07-04 9:42 UTC (permalink / raw) To: git Cc: brian m. carlson, Karthik Nayak, K Jayatheerth, ryenus, Junio C Hamano, Justin Tobler The "reftable" format has come a long way and has matured nicely since it has been merged into git via 57db2a094d5 (refs: introduce reftable backend, 2024-02-07). It fixes longstanding issues that cannot be fixed with the "files" format in a backwards-compatible way and performs significantly better in many use cases. Announce that we will switch to the "reftable" format in Git 3.0 for newly created repositories and wire up the change, hidden behind the WITH_BREAKING_CHANGES preprocessor define. This switch is dependent on support in the larger Git ecosystem. Most importantly, libraries like JGit, libgit2 and Gitoxide should support the reftable backend so that we don't break all applications and tools built on top of those libraries. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- Documentation/BreakingChanges.adoc | 47 ++++++++++++++++++++++++++++++++++++++ help.c | 2 ++ repository.h | 6 +++++ setup.c | 2 ++ t/t0001-init.sh | 11 +++++++++ 5 files changed, 68 insertions(+) diff --git a/Documentation/BreakingChanges.adoc b/Documentation/BreakingChanges.adoc index c6bd94986c5..f8d2eba061c 100644 --- a/Documentation/BreakingChanges.adoc +++ b/Documentation/BreakingChanges.adoc @@ -118,6 +118,53 @@ Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@zombino.com>, <20170223155046.e7nxivfwqqoprsqj@LykOS.localdomain>, <CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47EQDE+DiUQ@mail.gmail.com>. +* The default storage format for references in newly created repositories will + be changed from "files" to "reftable". The "reftable" format provides + multiple advantages over the "files" format: ++ + ** It is impossible to store two references that only differ in casing on + case-insensitive filesystems with the "files" format. This issue is common + on Windows and macOS platforms. As the "reftable" backend does not use + filesystem paths to encode reference names this problem goes away. + ** Similarly, macOS normalizes path names that contain unicode characters, + which has the consequence that you cannot store two names with unicode + characters that are encoded differently with the "files" backend. Again, + this is not an issue with the "reftable" backend. + ** Deleting references with the "files" backend requires Git to rewrite the + complete "packed-refs" file. In large repositories with many references + this file can easily be dozens of megabytes in size, in extreme cases it + may be gigabytes. The "reftable" backend uses tombstone markers for + deleted references and thus does not have to rewrite all of its data. + ** Repository housekeeping with the "files" backend typically performs + all-into-one repacks of references. This can be quite expensive, and + consequently housekeeping is a tradeoff between the number of loose + references that accumulate and slow down operations that read references, + and compressing those loose references into the "packed-refs" file. The + "reftable" backend uses geometric compaction after every write, which + amortizes costs and ensures that the backend is always in a + well-maintained state. + ** Operations that write multiple references at once are not atomic with the + "files" backend. Consequently, Git may see in-between states when it reads + references while a reference transaction is in the process of being + committed to disk. + ** Writing many references at once is slow with the "files" backend because + every reference is created as a separate file. The "reftable" backend + significantly outperforms the "files" backend by multiple orders of + magnitude. + ** The reftable backend uses a binary format with prefix compression for + reference names. As a result, the format uses less space compared to the + "packed-refs" file. ++ +Users that get immediate benefit from the "reftable" backend could continue to +opt-in to the "reftable" format manually by setting the "init.defaultRefFormat" +config. But defaults matter, and we think that overall users will have a better +experience with less platform-specific quirks when they use the new backend by +default. ++ +A prerequisite for this change is that the ecosystem is ready to support the +"reftable" format. Most importantly, alternative implementations of Git like +JGit, libgit2 and Gitoxide need to support it. + === Removals * Support for grafting commits has long been superseded by git-replace(1). diff --git a/help.c b/help.c index 21b778707a6..89cd47e3b86 100644 --- a/help.c +++ b/help.c @@ -810,6 +810,8 @@ void get_version_info(struct strbuf *buf, int show_build_options) SHA1_UNSAFE_BACKEND); #endif strbuf_addf(buf, "SHA-256: %s\n", SHA256_BACKEND); + strbuf_addf(buf, "default-ref-format: %s\n", + ref_storage_format_to_name(REF_STORAGE_FORMAT_DEFAULT)); } } diff --git a/repository.h b/repository.h index c4c92b2ab9c..77c4189d5dc 100644 --- a/repository.h +++ b/repository.h @@ -20,6 +20,12 @@ enum ref_storage_format { REF_STORAGE_FORMAT_REFTABLE, }; +#ifdef WITH_BREAKING_CHANGES /* Git 3.0 */ +# define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_REFTABLE +#else +# define REF_STORAGE_FORMAT_DEFAULT REF_STORAGE_FORMAT_FILES +#endif + struct repo_path_cache { char *squash_msg; char *merge_msg; diff --git a/setup.c b/setup.c index f93bd6a24a5..f0c06c655a9 100644 --- a/setup.c +++ b/setup.c @@ -2541,6 +2541,8 @@ static void repository_format_configure(struct repository_format *repo_fmt, repo_fmt->ref_storage_format = ref_format; } else if (cfg.ref_format != REF_STORAGE_FORMAT_UNKNOWN) { repo_fmt->ref_storage_format = cfg.ref_format; + } else { + repo_fmt->ref_storage_format = REF_STORAGE_FORMAT_DEFAULT; } repo_set_ref_storage_format(the_repository, repo_fmt->ref_storage_format); } diff --git a/t/t0001-init.sh b/t/t0001-init.sh index f11a40811f2..186664162fc 100755 --- a/t/t0001-init.sh +++ b/t/t0001-init.sh @@ -658,6 +658,17 @@ test_expect_success 'init warns about invalid init.defaultRefFormat' ' test_cmp expected actual ' +test_expect_success 'default ref format' ' + test_when_finished "rm -rf refformat" && + ( + sane_unset GIT_DEFAULT_REF_FORMAT && + git init refformat + ) && + git version --build-options | sed -ne "s/^default-ref-format: //p" >expect && + git -C refformat rev-parse --show-ref-format >actual && + test_cmp expect actual +' + backends="files reftable" for format in $backends do -- 2.50.0.195.g74e6fc65d0.dirty ^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 2/2] setup: use "reftable" format when experimental features are enabled 2025-07-04 9:42 ` [PATCH v3 " Patrick Steinhardt 2025-07-04 9:42 ` [PATCH v3 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt @ 2025-07-04 9:42 ` Patrick Steinhardt 2025-07-04 13:14 ` [PATCH v3 0/2] Add reftable by default as a breaking change Karthik Nayak 2 siblings, 0 replies; 21+ messages in thread From: Patrick Steinhardt @ 2025-07-04 9:42 UTC (permalink / raw) To: git Cc: brian m. carlson, Karthik Nayak, K Jayatheerth, ryenus, Junio C Hamano, Justin Tobler With the preceding commit we have announced the switch to the "reftable" format in Git 3.0 for newly created repositories. The format is being battle tested by GitLab and a couple of other developers, and except for a small handful of issues exposed early after it has been merged it has been rock solid. Regardless of that though the test user base is still comparatively small, which increases the risk that we miss critical bugs. Address this by enabling the reftable format when experimental features are enabled. This should increase the test user base by some margin and thus give us more input before making the format the default. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- Documentation/config/feature.adoc | 6 ++++++ setup.c | 12 ++++++++++++ t/t0001-init.sh | 34 ++++++++++++++++++++++++++++++++++ 3 files changed, 52 insertions(+) diff --git a/Documentation/config/feature.adoc b/Documentation/config/feature.adoc index cb49ff2604a..924f5ff4e3c 100644 --- a/Documentation/config/feature.adoc +++ b/Documentation/config/feature.adoc @@ -24,6 +24,12 @@ reusing objects from multiple packs instead of just one. * `pack.usePathWalk` may speed up packfile creation and make the packfiles be significantly smaller in the presence of certain filename collisions with Git's default name-hash. ++ +* `init.defaultRefFormat=reftable` causes newly initialized repositories to use +the reftable format for storing references. This new format solves issues with +case-insensitive filesystems, compresses better and performs significantly +better with many use cases. Refer to Documentation/technical/reftable.adoc for +more information on this new storage format. feature.manyFiles:: Enable config options that optimize for repos with many files in the diff --git a/setup.c b/setup.c index f0c06c655a9..97d7824d07a 100644 --- a/setup.c +++ b/setup.c @@ -2481,6 +2481,18 @@ static int read_default_format_config(const char *key, const char *value, goto out; } + /* + * Enable the reftable format when "features.experimental" is enabled. + * "init.defaultRefFormat" takes precedence over this setting. + */ + if (!strcmp(key, "feature.experimental") && + cfg->ref_format == REF_STORAGE_FORMAT_UNKNOWN && + git_config_bool(key, value)) { + cfg->ref_format = REF_STORAGE_FORMAT_REFTABLE; + ret = 0; + goto out; + } + ret = 0; out: free(str); diff --git a/t/t0001-init.sh b/t/t0001-init.sh index 186664162fc..f593c536874 100755 --- a/t/t0001-init.sh +++ b/t/t0001-init.sh @@ -749,6 +749,40 @@ test_expect_success "GIT_DEFAULT_REF_FORMAT= overrides init.defaultRefFormat" ' test_cmp expect actual ' +test_expect_success "init with feature.experimental=true" ' + test_when_finished "rm -rf refformat" && + test_config_global feature.experimental true && + ( + sane_unset GIT_DEFAULT_REF_FORMAT && + git init refformat + ) && + echo reftable >expect && + git -C refformat rev-parse --show-ref-format >actual && + test_cmp expect actual +' + +test_expect_success "init.defaultRefFormat overrides feature.experimental=true" ' + test_when_finished "rm -rf refformat" && + test_config_global feature.experimental true && + test_config_global init.defaultRefFormat files && + ( + sane_unset GIT_DEFAULT_REF_FORMAT && + git init refformat + ) && + echo files >expect && + git -C refformat rev-parse --show-ref-format >actual && + test_cmp expect actual +' + +test_expect_success "GIT_DEFAULT_REF_FORMAT= overrides feature.experimental=true" ' + test_when_finished "rm -rf refformat" && + test_config_global feature.experimental true && + GIT_DEFAULT_REF_FORMAT=files git init refformat && + echo files >expect && + git -C refformat rev-parse --show-ref-format >actual && + test_cmp expect actual +' + for from_format in $backends do test_expect_success "re-init with same format ($from_format)" ' -- 2.50.0.195.g74e6fc65d0.dirty ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v3 0/2] Add reftable by default as a breaking change 2025-07-04 9:42 ` [PATCH v3 " Patrick Steinhardt 2025-07-04 9:42 ` [PATCH v3 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt 2025-07-04 9:42 ` [PATCH v3 2/2] setup: use "reftable" format when experimental features are enabled Patrick Steinhardt @ 2025-07-04 13:14 ` Karthik Nayak 2 siblings, 0 replies; 21+ messages in thread From: Karthik Nayak @ 2025-07-04 13:14 UTC (permalink / raw) To: Patrick Steinhardt, git Cc: brian m. carlson, K Jayatheerth, ryenus, Junio C Hamano, Justin Tobler [-- Attachment #1: Type: text/plain, Size: 1326 bytes --] Patrick Steinhardt <ps@pks.im> writes: > Hi, > > the recent thread at [1] motivated me to hack together this tiny patch > series that paves our path towards making the reftable backend the > default backend. It does two things: > > - It announces the breaking change for Git 3.0. > > - It makes it the default now already when "feature.experimental" is > enabled. > > The first item is subject to ecosystem support, most notably in > libraries like Gitoxide, libgit2 and JGit. The second item is intended > to extend the user base to power users so that we get more test exposure > out in the wild before we make it the default in Git 3.0. > > Changes in v2: > - Improve the breaking changes announcement a bit based on feedback. > - Introduce a `REF_STORAGE_FORMAT_DEFAULT` define. > - Print the default ref format as part of `git version --build-options`. > - Link to v1: https://lore.kernel.org/r/20250702-pks-reftable-default-backend-v1-0-84dbaddafb50@pks.im > > Changes in v3: > - Small tweaks to the commit messages. > - Mention better data compression as another benefit. > - Link to v2: https://lore.kernel.org/r/20250703-pks-reftable-default-backend-v2-0-5a27e72a8c5e@pks.im > > Thanks! > > Patrick The changes in this version look good and as expected. This looks good to me. Thanks [snip] [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-07-07 5:37 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-07-02 10:14 [PATCH 0/2] Add reftable by default as a breaking change Patrick Steinhardt 2025-07-02 10:14 ` [PATCH 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt 2025-07-02 17:03 ` Junio C Hamano 2025-07-02 21:21 ` brian m. carlson 2025-07-03 4:43 ` Patrick Steinhardt 2025-07-03 4:43 ` Patrick Steinhardt 2025-07-02 17:17 ` Justin Tobler 2025-07-03 5:00 ` Patrick Steinhardt 2025-07-02 10:14 ` [PATCH 2/2] setup: use "reftable" format when experimental features are enabled Patrick Steinhardt 2025-07-03 6:15 ` [PATCH v2 0/2] Add reftable by default as a breaking change Patrick Steinhardt 2025-07-03 6:15 ` [PATCH v2 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt 2025-07-03 10:54 ` Karthik Nayak 2025-07-03 11:42 ` Patrick Steinhardt 2025-07-03 12:24 ` Karthik Nayak 2025-07-03 13:08 ` Patrick Steinhardt 2025-07-03 6:15 ` [PATCH v2 2/2] setup: use "reftable" format when experimental features are enabled Patrick Steinhardt 2025-07-07 5:37 ` [PATCH v2 0/2] Add reftable by default as a breaking change Junio C Hamano 2025-07-04 9:42 ` [PATCH v3 " Patrick Steinhardt 2025-07-04 9:42 ` [PATCH v3 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt 2025-07-04 9:42 ` [PATCH v3 2/2] setup: use "reftable" format when experimental features are enabled Patrick Steinhardt 2025-07-04 13:14 ` [PATCH v3 0/2] Add reftable by default as a breaking change Karthik Nayak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).