* [PATCH] setup: recognize bare repositories with packed-refs
@ 2023-11-17 20:25 Adam Majer
2023-11-17 20:32 ` Adam Majer
0 siblings, 1 reply; 23+ messages in thread
From: Adam Majer @ 2023-11-17 20:25 UTC (permalink / raw)
To: git; +Cc: Adam Majer
When a garbage collected bare git repository is added to another git
repository, the refs/ subdirectory is empty. In case-cases when such a
repository is added into another repository directly, it no longer is
detected as a valid. Git doesn't preserve empty paths so refs/
subdirectory is not present in parent git. Simply creating an empty
refs/ subdirectory fixes this problem.
Looking more carefully, there are two backends to handle various refs in
git -- the files backend that uses refs/ subdirectory and the
packed-refs backend that uses packed-refs file. If references are not
found in refs/ subdirectory (or directory doesn't exist), the
packed-refs directory will be consulted. Garbage collected repository
will have all its references in packed-refs file.
To allow the use-case when packed-refs is the only source of refs and
refs/ subdirectory is simply not present, augment 'is_git_directory()'
setup function to look for packed-refs file as an alternative to refs/
subdirectory.
Signed-off-by: Adam Majer <adamm@zombino.com>
---
setup.c | 10 +++++++---
t/t6500-gc.sh | 8 ++++++++
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/setup.c b/setup.c
index fc592dc6dd..2a6dda6ae9 100644
--- a/setup.c
+++ b/setup.c
@@ -348,7 +348,7 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
*
* - either an objects/ directory _or_ the proper
* GIT_OBJECT_DIRECTORY environment variable
- * - a refs/ directory
+ * - a refs/ directory or packed-refs file
* - either a HEAD symlink or a HEAD file that is formatted as
* a proper "ref:", or a regular file HEAD that has a properly
* formatted sha1 object name.
@@ -384,8 +384,12 @@ int is_git_directory(const char *suspect)
strbuf_setlen(&path, len);
strbuf_addstr(&path, "/refs");
- if (access(path.buf, X_OK))
- goto done;
+ if (access(path.buf, X_OK)) {
+ strbuf_setlen(&path, len);
+ strbuf_addstr(&path, "/packed-refs");
+ if (access(path.buf, R_OK))
+ goto done;
+ }
ret = 1;
done:
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 18fe1c25e6..e81eb7d2ab 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -214,6 +214,14 @@ test_expect_success 'gc.repackFilter launches repack with a filter' '
grep -E "^trace: (built-in|exec|run_command): git repack .* --filter=blob:none ?.*" trace.out
'
+test_expect_success 'remove empty refs/ subdirectory' '
+ git -C bare.git cat-file -e 60dd8ad7d8ed49005c18b7ce9f5b7a45c28df814 &&
+ test_dir_is_empty bare.git/refs/heads &&
+ test_dir_is_empty bare.git/refs/tags &&
+ rm -r bare.git/refs &&
+ GIT_DIR="$PWD"/bare.git git -C bare.git cat-file -e 60dd8ad7d8ed49005c18b7ce9f5b7a45c28df814
+'
+
test_expect_success 'gc.repackFilterTo store filtered out objects' '
test_when_finished "rm -rf bare.git filtered.git" &&
--
2.42.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH] setup: recognize bare repositories with packed-refs
2023-11-17 20:25 [PATCH] setup: recognize bare repositories with packed-refs Adam Majer
@ 2023-11-17 20:32 ` Adam Majer
2023-11-17 20:44 ` Adam Majer
` (3 more replies)
0 siblings, 4 replies; 23+ messages in thread
From: Adam Majer @ 2023-11-17 20:32 UTC (permalink / raw)
To: git; +Cc: Adam Majer
In a garbage collected bare git repository, the refs/ subdirectory is
empty. In use-cases when such a repository is directly added into
another repository, it no longer is detected as valid. Git doesn't
preserve empty paths so refs/ subdirectory is not present. Simply
creating an empty refs/ subdirectory fixes this problem.
Looking more carefully, there are two backends to handle various refs in
git -- the files backend that uses refs/ subdirectory and the
packed-refs backend that uses packed-refs file. If references are not
found in refs/ subdirectory (or directory doesn't exist), the
packed-refs directory will be consulted. Garbage collected repository
will have all its references in packed-refs file.
To allow the use-case when packed-refs is the only source of refs and
refs/ subdirectory is simply not present, augment 'is_git_directory()'
setup function to look for packed-refs file as an alternative to refs/
subdirectory.
Signed-off-by: Adam Majer <adamm@zombino.com>
---
setup.c | 10 +++++++---
t/t6500-gc.sh | 8 ++++++++
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/setup.c b/setup.c
index fc592dc6dd..2a6dda6ae9 100644
--- a/setup.c
+++ b/setup.c
@@ -348,7 +348,7 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
*
* - either an objects/ directory _or_ the proper
* GIT_OBJECT_DIRECTORY environment variable
- * - a refs/ directory
+ * - a refs/ directory or packed-refs file
* - either a HEAD symlink or a HEAD file that is formatted as
* a proper "ref:", or a regular file HEAD that has a properly
* formatted sha1 object name.
@@ -384,8 +384,12 @@ int is_git_directory(const char *suspect)
strbuf_setlen(&path, len);
strbuf_addstr(&path, "/refs");
- if (access(path.buf, X_OK))
- goto done;
+ if (access(path.buf, X_OK)) {
+ strbuf_setlen(&path, len);
+ strbuf_addstr(&path, "/packed-refs");
+ if (access(path.buf, R_OK))
+ goto done;
+ }
ret = 1;
done:
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 18fe1c25e6..e81eb7d2ab 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -214,6 +214,14 @@ test_expect_success 'gc.repackFilter launches repack with a filter' '
grep -E "^trace: (built-in|exec|run_command): git repack .* --filter=blob:none ?.*" trace.out
'
+test_expect_success 'remove empty refs/ subdirectory' '
+ git -C bare.git cat-file -e 60dd8ad7d8ed49005c18b7ce9f5b7a45c28df814 &&
+ test_dir_is_empty bare.git/refs/heads &&
+ test_dir_is_empty bare.git/refs/tags &&
+ rm -r bare.git/refs &&
+ GIT_DIR="$PWD"/bare.git git -C bare.git cat-file -e 60dd8ad7d8ed49005c18b7ce9f5b7a45c28df814
+'
+
test_expect_success 'gc.repackFilterTo store filtered out objects' '
test_when_finished "rm -rf bare.git filtered.git" &&
--
2.42.0
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-11-17 20:32 ` Adam Majer
@ 2023-11-17 20:44 ` Adam Majer
2023-11-19 23:24 ` Junio C Hamano
` (2 subsequent siblings)
3 siblings, 0 replies; 23+ messages in thread
From: Adam Majer @ 2023-11-17 20:44 UTC (permalink / raw)
To: git
On 2023-11-17 21:32, Adam Majer wrote:
> +test_expect_success 'remove empty refs/ subdirectory' '
> + git -C bare.git cat-file -e 60dd8ad7d8ed49005c18b7ce9f5b7a45c28df814 &&
> + test_dir_is_empty bare.git/refs/heads &&
> + test_dir_is_empty bare.git/refs/tags &&
> + rm -r bare.git/refs &&
> + GIT_DIR="$PWD"/bare.git git -C bare.git cat-file -e 60dd8ad7d8ed49005c18b7ce9f5b7a45c28df814
In the test, I've first tried to use GIT_CEILING_DIRECTORIES="$PWD" here
instead, but git went up into the parent directory anyway. I'm not sure
if this is a bug, or feature, or my misunderstanding.
- Adam
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-11-17 20:32 ` Adam Majer
2023-11-17 20:44 ` Adam Majer
@ 2023-11-19 23:24 ` Junio C Hamano
2023-11-20 9:31 ` Glen Choo
2023-11-20 9:43 ` Adam Majer
2023-11-27 19:44 ` Josh Steadmon
2023-11-28 14:28 ` Adam Majer
3 siblings, 2 replies; 23+ messages in thread
From: Junio C Hamano @ 2023-11-19 23:24 UTC (permalink / raw)
To: Glen Choo, Josh Steadmon; +Cc: git, Adam Majer
Adam Majer <adamm@zombino.com> writes:
> In a garbage collected bare git repository, the refs/ subdirectory is
> empty. In use-cases when such a repository is directly added into
> another repository, it no longer is detected as valid.
Josh & Glen [*], isn't this a layout that we explicitly discourage and
eventually plan to forbid anyway?
*1* who worked on e35f202b (setup: trace bare repository setups, 2023-05-01)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-11-19 23:24 ` Junio C Hamano
@ 2023-11-20 9:31 ` Glen Choo
2023-11-27 19:42 ` Josh Steadmon
2023-11-20 9:43 ` Adam Majer
1 sibling, 1 reply; 23+ messages in thread
From: Glen Choo @ 2023-11-20 9:31 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Josh Steadmon, git, Adam Majer
On Mon, Nov 20, 2023 at 7:24 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Adam Majer <adamm@zombino.com> writes:
>
> > In a garbage collected bare git repository, the refs/ subdirectory is
> > empty. In use-cases when such a repository is directly added into
> > another repository, it no longer is detected as valid.
>
> Josh & Glen [*], isn't this a layout that we explicitly discourage and
> eventually plan to forbid anyway?
If my recollection of [1] serves me correctly, we didn't come to a
strong conclusion on whether or not to forbid bare repositories in the
working tree, particularly because it would leave existing repos (like
Git LFS) high and dry. Though personally, I'd be happy to see a
version of Git that forbade bare repositories in the working tree.
I don't really recall the bare repo tracing bits, so I'll leave that to Josh.
[1] https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com/
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-11-19 23:24 ` Junio C Hamano
2023-11-20 9:31 ` Glen Choo
@ 2023-11-20 9:43 ` Adam Majer
1 sibling, 0 replies; 23+ messages in thread
From: Adam Majer @ 2023-11-20 9:43 UTC (permalink / raw)
To: Junio C Hamano, Glen Choo, Josh Steadmon; +Cc: git
On 11/20/23 00:24, Junio C Hamano wrote:
> Adam Majer <adamm@zombino.com> writes:
>
>> In a garbage collected bare git repository, the refs/ subdirectory is
>> empty. In use-cases when such a repository is directly added into
>> another repository, it no longer is detected as valid.
>
> Josh & Glen [*], isn't this a layout that we explicitly discourage and
> eventually plan to forbid anyway?
>
> *1* who worked on e35f202b (setup: trace bare repository setups, 2023-05-01)
This is fair enough. Completely removing embedded git repos would cause
some pain in test suites as it was discussed in the thread for that
commit[1]. Gitea (for example) has a few dozen embedded bare
repositories for tests.
In either case, running `git gc` on a bare repository makes it no longer
detectable as a git repository after checkout, GIT_DIR or not. This
seems to be unintentional and not linked to the other discussion.
- Adam
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-11-20 9:31 ` Glen Choo
@ 2023-11-27 19:42 ` Josh Steadmon
0 siblings, 0 replies; 23+ messages in thread
From: Josh Steadmon @ 2023-11-27 19:42 UTC (permalink / raw)
To: Glen Choo; +Cc: Junio C Hamano, git, Adam Majer
On 2023.11.20 17:31, Glen Choo wrote:
> On Mon, Nov 20, 2023 at 7:24 AM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Adam Majer <adamm@zombino.com> writes:
> >
> > > In a garbage collected bare git repository, the refs/ subdirectory is
> > > empty. In use-cases when such a repository is directly added into
> > > another repository, it no longer is detected as valid.
> >
> > Josh & Glen [*], isn't this a layout that we explicitly discourage and
> > eventually plan to forbid anyway?
>
> If my recollection of [1] serves me correctly, we didn't come to a
> strong conclusion on whether or not to forbid bare repositories in the
> working tree, particularly because it would leave existing repos (like
> Git LFS) high and dry. Though personally, I'd be happy to see a
> version of Git that forbade bare repositories in the working tree.
>
> I don't really recall the bare repo tracing bits, so I'll leave that to Josh.
>
> [1] https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com/
Yeah, my understanding was that we don't want to forbid bare
repositories outright, which is why we have the config option to let
end-users choose what to do with them.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-11-17 20:32 ` Adam Majer
2023-11-17 20:44 ` Adam Majer
2023-11-19 23:24 ` Junio C Hamano
@ 2023-11-27 19:44 ` Josh Steadmon
2023-11-28 14:14 ` Adam Majer
2023-11-28 14:28 ` Adam Majer
3 siblings, 1 reply; 23+ messages in thread
From: Josh Steadmon @ 2023-11-27 19:44 UTC (permalink / raw)
To: Adam Majer; +Cc: git
On 2023.11.17 21:32, Adam Majer wrote:
> In a garbage collected bare git repository, the refs/ subdirectory is
> empty. In use-cases when such a repository is directly added into
> another repository, it no longer is detected as valid. Git doesn't
> preserve empty paths so refs/ subdirectory is not present. Simply
> creating an empty refs/ subdirectory fixes this problem.
>
> Looking more carefully, there are two backends to handle various refs in
> git -- the files backend that uses refs/ subdirectory and the
> packed-refs backend that uses packed-refs file. If references are not
> found in refs/ subdirectory (or directory doesn't exist), the
> packed-refs directory will be consulted. Garbage collected repository
> will have all its references in packed-refs file.
>
> To allow the use-case when packed-refs is the only source of refs and
> refs/ subdirectory is simply not present, augment 'is_git_directory()'
> setup function to look for packed-refs file as an alternative to refs/
> subdirectory.
>
> Signed-off-by: Adam Majer <adamm@zombino.com>
> ---
> setup.c | 10 +++++++---
> t/t6500-gc.sh | 8 ++++++++
> 2 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/setup.c b/setup.c
> index fc592dc6dd..2a6dda6ae9 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -348,7 +348,7 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
> *
> * - either an objects/ directory _or_ the proper
> * GIT_OBJECT_DIRECTORY environment variable
> - * - a refs/ directory
> + * - a refs/ directory or packed-refs file
> * - either a HEAD symlink or a HEAD file that is formatted as
> * a proper "ref:", or a regular file HEAD that has a properly
> * formatted sha1 object name.
> @@ -384,8 +384,12 @@ int is_git_directory(const char *suspect)
>
> strbuf_setlen(&path, len);
> strbuf_addstr(&path, "/refs");
> - if (access(path.buf, X_OK))
> - goto done;
> + if (access(path.buf, X_OK)) {
> + strbuf_setlen(&path, len);
> + strbuf_addstr(&path, "/packed-refs");
> + if (access(path.buf, R_OK))
> + goto done;
> + }
>
> ret = 1;
> done:
> diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
> index 18fe1c25e6..e81eb7d2ab 100755
> --- a/t/t6500-gc.sh
> +++ b/t/t6500-gc.sh
> @@ -214,6 +214,14 @@ test_expect_success 'gc.repackFilter launches repack with a filter' '
> grep -E "^trace: (built-in|exec|run_command): git repack .* --filter=blob:none ?.*" trace.out
> '
>
> +test_expect_success 'remove empty refs/ subdirectory' '
> + git -C bare.git cat-file -e 60dd8ad7d8ed49005c18b7ce9f5b7a45c28df814 &&
> + test_dir_is_empty bare.git/refs/heads &&
> + test_dir_is_empty bare.git/refs/tags &&
> + rm -r bare.git/refs &&
> + GIT_DIR="$PWD"/bare.git git -C bare.git cat-file -e 60dd8ad7d8ed49005c18b7ce9f5b7a45c28df814
> +'
> +
Two suggestions for the test here:
1) Can you give the test a more descriptive name, such as "GCed bare
repos still recognized"?
2) Can you add a check that bare.git/packed-refs exists?
Other than that, looks good to me.
Reviewed-by: Josh Steadmon <steadmon@google.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-11-27 19:44 ` Josh Steadmon
@ 2023-11-28 14:14 ` Adam Majer
0 siblings, 0 replies; 23+ messages in thread
From: Adam Majer @ 2023-11-28 14:14 UTC (permalink / raw)
To: Josh Steadmon, git
On 11/27/23 20:44, Josh Steadmon wrote:> Two suggestions for the test here:
> 1) Can you give the test a more descriptive name, such as "GCed bare
> repos still recognized"?
Thanks, adjusted. I've also added that empty refs/ directory is not there.
> 2) Can you add a check that bare.git/packed-refs exists?
Done.
I've also removed the -C parameter since we actually need GIT_DIR= in
all cases to prevent git from going up directory tree. -C is then
superflous. In addition, I've changed the hardcoded object id to master
branch to make it less magical looking.
- Adam
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] setup: recognize bare repositories with packed-refs
2023-11-17 20:32 ` Adam Majer
` (2 preceding siblings ...)
2023-11-27 19:44 ` Josh Steadmon
@ 2023-11-28 14:28 ` Adam Majer
2023-11-28 18:45 ` Josh Steadmon
2023-11-28 19:04 ` Jeff King
3 siblings, 2 replies; 23+ messages in thread
From: Adam Majer @ 2023-11-28 14:28 UTC (permalink / raw)
To: git; +Cc: Adam Majer
In a garbage collected bare git repository, the refs/ subdirectory is
empty. In use-cases when such a repository is directly added into
another repository, it no longer is detected as valid. Git doesn't
preserve empty paths so refs/ subdirectory is not present. Simply
creating an empty refs/ subdirectory fixes this problem.
Looking more carefully, there are two backends to handle various refs in
git -- the files backend that uses refs/ subdirectory and the
packed-refs backend that uses packed-refs file. If references are not
found in refs/ subdirectory (or directory doesn't exist), the
packed-refs directory will be consulted. Garbage collected repository
will have all its references in packed-refs file.
To allow the use-case when packed-refs is the only source of refs and
refs/ subdirectory is simply not present, augment 'is_git_directory()'
setup function to look for packed-refs file as an alternative to refs/
subdirectory.
Signed-off-by: Adam Majer <adamm@zombino.com>
---
setup.c | 10 +++++++---
t/t6500-gc.sh | 9 +++++++++
2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/setup.c b/setup.c
index fc592dc6dd..2a6dda6ae9 100644
--- a/setup.c
+++ b/setup.c
@@ -348,7 +348,7 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
*
* - either an objects/ directory _or_ the proper
* GIT_OBJECT_DIRECTORY environment variable
- * - a refs/ directory
+ * - a refs/ directory or packed-refs file
* - either a HEAD symlink or a HEAD file that is formatted as
* a proper "ref:", or a regular file HEAD that has a properly
* formatted sha1 object name.
@@ -384,8 +384,12 @@ int is_git_directory(const char *suspect)
strbuf_setlen(&path, len);
strbuf_addstr(&path, "/refs");
- if (access(path.buf, X_OK))
- goto done;
+ if (access(path.buf, X_OK)) {
+ strbuf_setlen(&path, len);
+ strbuf_addstr(&path, "/packed-refs");
+ if (access(path.buf, R_OK))
+ goto done;
+ }
ret = 1;
done:
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 18fe1c25e6..4ad1690817 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -214,6 +214,15 @@ test_expect_success 'gc.repackFilter launches repack with a filter' '
grep -E "^trace: (built-in|exec|run_command): git repack .* --filter=blob:none ?.*" trace.out
'
+test_expect_success 'GCed bare repos without empty refs/ still recognized' '
+ GIT_DIR="$PWD"/bare.git git cat-file -e master &&
+ test_dir_is_empty bare.git/refs/heads &&
+ test_dir_is_empty bare.git/refs/tags &&
+ test_file_not_empty bare.git/packed-refs &&
+ rm -r bare.git/refs &&
+ GIT_DIR="$PWD"/bare.git git cat-file -e master
+'
+
test_expect_success 'gc.repackFilterTo store filtered out objects' '
test_when_finished "rm -rf bare.git filtered.git" &&
--
2.43.0.1.g67290e5b65
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-11-28 14:28 ` Adam Majer
@ 2023-11-28 18:45 ` Josh Steadmon
2023-11-28 19:04 ` Jeff King
1 sibling, 0 replies; 23+ messages in thread
From: Josh Steadmon @ 2023-11-28 18:45 UTC (permalink / raw)
To: Adam Majer; +Cc: git
On 2023.11.28 15:28, Adam Majer wrote:
> In a garbage collected bare git repository, the refs/ subdirectory is
> empty. In use-cases when such a repository is directly added into
> another repository, it no longer is detected as valid. Git doesn't
> preserve empty paths so refs/ subdirectory is not present. Simply
> creating an empty refs/ subdirectory fixes this problem.
>
> Looking more carefully, there are two backends to handle various refs in
> git -- the files backend that uses refs/ subdirectory and the
> packed-refs backend that uses packed-refs file. If references are not
> found in refs/ subdirectory (or directory doesn't exist), the
> packed-refs directory will be consulted. Garbage collected repository
> will have all its references in packed-refs file.
>
> To allow the use-case when packed-refs is the only source of refs and
> refs/ subdirectory is simply not present, augment 'is_git_directory()'
> setup function to look for packed-refs file as an alternative to refs/
> subdirectory.
>
> Signed-off-by: Adam Majer <adamm@zombino.com>
> ---
> setup.c | 10 +++++++---
> t/t6500-gc.sh | 9 +++++++++
> 2 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/setup.c b/setup.c
> index fc592dc6dd..2a6dda6ae9 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -348,7 +348,7 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
> *
> * - either an objects/ directory _or_ the proper
> * GIT_OBJECT_DIRECTORY environment variable
> - * - a refs/ directory
> + * - a refs/ directory or packed-refs file
> * - either a HEAD symlink or a HEAD file that is formatted as
> * a proper "ref:", or a regular file HEAD that has a properly
> * formatted sha1 object name.
> @@ -384,8 +384,12 @@ int is_git_directory(const char *suspect)
>
> strbuf_setlen(&path, len);
> strbuf_addstr(&path, "/refs");
> - if (access(path.buf, X_OK))
> - goto done;
> + if (access(path.buf, X_OK)) {
> + strbuf_setlen(&path, len);
> + strbuf_addstr(&path, "/packed-refs");
> + if (access(path.buf, R_OK))
> + goto done;
> + }
>
> ret = 1;
> done:
> diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
> index 18fe1c25e6..4ad1690817 100755
> --- a/t/t6500-gc.sh
> +++ b/t/t6500-gc.sh
> @@ -214,6 +214,15 @@ test_expect_success 'gc.repackFilter launches repack with a filter' '
> grep -E "^trace: (built-in|exec|run_command): git repack .* --filter=blob:none ?.*" trace.out
> '
>
> +test_expect_success 'GCed bare repos without empty refs/ still recognized' '
> + GIT_DIR="$PWD"/bare.git git cat-file -e master &&
> + test_dir_is_empty bare.git/refs/heads &&
> + test_dir_is_empty bare.git/refs/tags &&
> + test_file_not_empty bare.git/packed-refs &&
> + rm -r bare.git/refs &&
> + GIT_DIR="$PWD"/bare.git git cat-file -e master
> +'
> +
> test_expect_success 'gc.repackFilterTo store filtered out objects' '
> test_when_finished "rm -rf bare.git filtered.git" &&
>
> --
> 2.43.0.1.g67290e5b65
Thanks for the fixes. This looks good to me. BTW, in the future please
add a version number when you send updated patches (e.g. add "-v 2" to
your command-line if you're using git-format-patch).
Reviewed-by: Josh Steadmon <steadmon@google.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-11-28 14:28 ` Adam Majer
2023-11-28 18:45 ` Josh Steadmon
@ 2023-11-28 19:04 ` Jeff King
2023-11-29 10:13 ` Patrick Steinhardt
` (2 more replies)
1 sibling, 3 replies; 23+ messages in thread
From: Jeff King @ 2023-11-28 19:04 UTC (permalink / raw)
To: Adam Majer; +Cc: git
On Tue, Nov 28, 2023 at 03:28:45PM +0100, Adam Majer wrote:
> In a garbage collected bare git repository, the refs/ subdirectory is
> empty. In use-cases when such a repository is directly added into
> another repository, it no longer is detected as valid. Git doesn't
> preserve empty paths so refs/ subdirectory is not present. Simply
> creating an empty refs/ subdirectory fixes this problem.
I understand your use case, but I still have a vague feeling that this
is bending some assumptions in a way that may create problems or
confusion later. In particular:
> Looking more carefully, there are two backends to handle various refs in
> git -- the files backend that uses refs/ subdirectory and the
> packed-refs backend that uses packed-refs file. If references are not
> found in refs/ subdirectory (or directory doesn't exist), the
> packed-refs directory will be consulted. Garbage collected repository
> will have all its references in packed-refs file.
This second paragraph doesn't seem totally accurate to me. There are not
really two backends that Git can use. For production use, there is just
one, the "files" backend, which happens to also use packed-refs under
the hood (and a convenient way for the code to structure this was a
subordinate backend). But it has never been possible to have a repo that
just uses packed-refs.
There is also the experimental reftable, of course. And there we have
not yet loosened is_git_directory(), and it has to create an unused
"refs/" directory (there has been some discussion about allowing it to
be an empty file, though no patches have been merged).
So with regards to the loosening in your patch, my questions would be:
- if we are going to change the rules for repository detection, is
this where we want to end up? We haven't changed them (yet) for
reftables. If we are going to do so, should we have a scheme that
will work for that transition, too? The "refs is an empty file"
scheme would fix your use case, too (though see below).
- is the rest of Git ready to handle a missing "refs/" directory? It
looks like making a ref will auto-create it (since we may have to
make refs/foo/bar/... anyway).
- what about other implementations? Your embedded repos will
presumably not work with libgit2, jgit, etc, until they also get
similar patches.
- what about empty repositories? In that case there will be no "refs/"
file and no "packed-refs" file (such a repository is less likely, of
course, but it may contain objects but no refs, or the point may be
to have an empty repo as a test vector). Likewise, it is possible
for a repository to have an empty "objects" directory (even with a
non-empty refs directory, if there are only symrefs), and your patch
doesn't address that.
> To allow the use-case when packed-refs is the only source of refs and
> refs/ subdirectory is simply not present, augment 'is_git_directory()'
> setup function to look for packed-refs file as an alternative to refs/
> subdirectory.
Getting back to your use case, I'd suggest one of:
- do the usual "touch refs/.gitignore" trick to explicitly track the
empty directory. It looks like the ref code will ignore this (we
don't allow ref names to start with "." in a path component)
- whatever is consuming the embedded repos could "mkdir -p refs
objects" as needed. This is a minor pain, but I think in the long
term we are moving to a world where you have to explicitly do
"GIT_DIR=$PWD/embedded.git" to access an embedded bare repo. So
they're already special and require some setup; adding an extra step
may not be so bad.
Now it may be that neither of those solutions is acceptable for various
reasons. But it is probably worth detailing those reasons in your commit
message.
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-11-28 19:04 ` Jeff King
@ 2023-11-29 10:13 ` Patrick Steinhardt
2023-12-06 20:10 ` Jeff King
2023-11-29 21:30 ` Taylor Blau
2023-12-08 18:17 ` Junio C Hamano
2 siblings, 1 reply; 23+ messages in thread
From: Patrick Steinhardt @ 2023-11-29 10:13 UTC (permalink / raw)
To: Jeff King; +Cc: Adam Majer, git
[-- Attachment #1: Type: text/plain, Size: 5736 bytes --]
On Tue, Nov 28, 2023 at 02:04:46PM -0500, Jeff King wrote:
> On Tue, Nov 28, 2023 at 03:28:45PM +0100, Adam Majer wrote:
>
> > In a garbage collected bare git repository, the refs/ subdirectory is
> > empty. In use-cases when such a repository is directly added into
> > another repository, it no longer is detected as valid. Git doesn't
> > preserve empty paths so refs/ subdirectory is not present. Simply
> > creating an empty refs/ subdirectory fixes this problem.
>
> I understand your use case, but I still have a vague feeling that this
> is bending some assumptions in a way that may create problems or
> confusion later. In particular:
>
> > Looking more carefully, there are two backends to handle various refs in
> > git -- the files backend that uses refs/ subdirectory and the
> > packed-refs backend that uses packed-refs file. If references are not
> > found in refs/ subdirectory (or directory doesn't exist), the
> > packed-refs directory will be consulted. Garbage collected repository
> > will have all its references in packed-refs file.
>
> This second paragraph doesn't seem totally accurate to me. There are not
> really two backends that Git can use. For production use, there is just
> one, the "files" backend, which happens to also use packed-refs under
> the hood (and a convenient way for the code to structure this was a
> subordinate backend). But it has never been possible to have a repo that
> just uses packed-refs.
>
> There is also the experimental reftable, of course. And there we have
> not yet loosened is_git_directory(), and it has to create an unused
> "refs/" directory (there has been some discussion about allowing it to
> be an empty file, though no patches have been merged).
As I'm currently working on the reftable backend this thought has also
crossed my mind. The reftable backend doesn't only create "refs/", but
it also creates "HEAD" with contents "ref: refs/heads/.invalid" so that
Git commands recognize the Git directory properly. Longer-term I would
really love to see us doing a better job of detecting Git repositories
so that we don't have to carry this legacy baggage around.
I can see different ways for how to do this:
- Either we iterate through all known reference backends, asking
each of them whether they recognize the directory as something
they understand.
- Or we start parsing the gitconfig of the repository so that we can
learn about which reference backend to expect, and then ask that
specific backend whether it thinks that the directory indeed looks
like something it can handle.
I'd personally prefer the latter, but I'm not sure whether we really
want to try and parse any file that happens to be called "config".
> So with regards to the loosening in your patch, my questions would be:
>
> - if we are going to change the rules for repository detection, is
> this where we want to end up? We haven't changed them (yet) for
> reftables. If we are going to do so, should we have a scheme that
> will work for that transition, too? The "refs is an empty file"
> scheme would fix your use case, too (though see below).
>
> - is the rest of Git ready to handle a missing "refs/" directory? It
> looks like making a ref will auto-create it (since we may have to
> make refs/foo/bar/... anyway).
>
> - what about other implementations? Your embedded repos will
> presumably not work with libgit2, jgit, etc, until they also get
> similar patches.
>
> - what about empty repositories? In that case there will be no "refs/"
> file and no "packed-refs" file (such a repository is less likely, of
> course, but it may contain objects but no refs, or the point may be
> to have an empty repo as a test vector). Likewise, it is possible
> for a repository to have an empty "objects" directory (even with a
> non-empty refs directory, if there are only symrefs), and your patch
> doesn't address that.
Just throwing this out there, but we could use this as an excuse to
introduce "extensions.refFormat". If it's explicitly configured to be
"reffiles" then we accept repositories even if they don't have the
"refs/" directory or a "packed-refs" file. This would still require work
in alternative implementations of Git, but this work will need to happen
anyway when the reftable backend lands.
I'd personally love for this extension to be introduced before I'm
sending the reftable backend upstream so that we can have discussions
around it beforehand.
Patrick
> > To allow the use-case when packed-refs is the only source of refs and
> > refs/ subdirectory is simply not present, augment 'is_git_directory()'
> > setup function to look for packed-refs file as an alternative to refs/
> > subdirectory.
>
> Getting back to your use case, I'd suggest one of:
>
> - do the usual "touch refs/.gitignore" trick to explicitly track the
> empty directory. It looks like the ref code will ignore this (we
> don't allow ref names to start with "." in a path component)
>
> - whatever is consuming the embedded repos could "mkdir -p refs
> objects" as needed. This is a minor pain, but I think in the long
> term we are moving to a world where you have to explicitly do
> "GIT_DIR=$PWD/embedded.git" to access an embedded bare repo. So
> they're already special and require some setup; adding an extra step
> may not be so bad.
>
> Now it may be that neither of those solutions is acceptable for various
> reasons. But it is probably worth detailing those reasons in your commit
> message.
>
> -Peff
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-11-28 19:04 ` Jeff King
2023-11-29 10:13 ` Patrick Steinhardt
@ 2023-11-29 21:30 ` Taylor Blau
2023-12-06 21:08 ` Jeff King
2023-12-08 18:17 ` Junio C Hamano
2 siblings, 1 reply; 23+ messages in thread
From: Taylor Blau @ 2023-11-29 21:30 UTC (permalink / raw)
To: Jeff King; +Cc: Adam Majer, git
On Tue, Nov 28, 2023 at 02:04:46PM -0500, Jeff King wrote:
> - whatever is consuming the embedded repos could "mkdir -p refs
> objects" as needed. This is a minor pain, but I think in the long
> term we are moving to a world where you have to explicitly do
> "GIT_DIR=$PWD/embedded.git" to access an embedded bare repo. So
> they're already special and require some setup; adding an extra step
> may not be so bad.
I hope not. I suppose that using embedded bare repositories in a test
requires additional setup at least to "cd" into the directory (if they
are not using `$GIT_DIR` or `--git-dir` already). But I fear that
imposing even a small change like this is too tall an order for how many
millions of these exist in the wild across all sorts of projects.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-11-29 10:13 ` Patrick Steinhardt
@ 2023-12-06 20:10 ` Jeff King
2023-12-07 7:01 ` Patrick Steinhardt
0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2023-12-06 20:10 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Adam Majer, git
On Wed, Nov 29, 2023 at 11:13:18AM +0100, Patrick Steinhardt wrote:
> As I'm currently working on the reftable backend this thought has also
> crossed my mind. The reftable backend doesn't only create "refs/", but
> it also creates "HEAD" with contents "ref: refs/heads/.invalid" so that
> Git commands recognize the Git directory properly. Longer-term I would
> really love to see us doing a better job of detecting Git repositories
> so that we don't have to carry this legacy baggage around.
>
> I can see different ways for how to do this:
>
> - Either we iterate through all known reference backends, asking
> each of them whether they recognize the directory as something
> they understand.
>
> - Or we start parsing the gitconfig of the repository so that we can
> learn about which reference backend to expect, and then ask that
> specific backend whether it thinks that the directory indeed looks
> like something it can handle.
>
> I'd personally prefer the latter, but I'm not sure whether we really
> want to try and parse any file that happens to be called "config".
We do eventually parse the config file to pick up repositoryFormatVersion.
But there's sort of a chicken-and-egg here where we only do so after
gaining some confidence that it's a repo directory. :)
I actually think the "ask each backend if it looks plausible" is
reasonable, at least for an implementation that knows about all
backends. Though what gives me pause is how older versions of Git will
behave with a new-format repository that does not have a "refs"
directory.
There are really two compatibility checks. In is_git_directory(), we
want to say "is this a repo or not". And then later we parse the config,
make sure the repository format is OK, and that we support all
extensions. So right now, an older version of Git that encounters a
reftable-formatted repo (that has a vestigial "refs/" directory) says
"ah, that is a repo, but I don't understand it" (the latter because
presumably the repo version/extensions in .git/config are values it
doesn't know about). But if we get rid of "refs/", then older versions
of Git will stop even considering it as a repo at all, and will keep
searching up to the ceiling directory. So either:
1. They'll find nothing, and you'll get "you're not in a git repo",
rather than "you're in a git repo, but I don't understand it".
Which is slightly worse.
2. They'll find some _other_ containing repo. Which could be quite
confusing.
So forgetting at all about how we structure the code, it seems to me
that the problem is not new code, but all of the existing code which
looks for access("refs", X_OK).
I dunno. Maybe that is being too paranoid about backwards compatibility.
People will have to turn on reftable manually, at least for a while, and
would hopefully know what they are signing up for, and that old versions
might not work as well. And by the time a new format becomes the
default, it's possible that those older versions would have become quite
rare.
> Just throwing this out there, but we could use this as an excuse to
> introduce "extensions.refFormat". If it's explicitly configured to be
> "reffiles" then we accept repositories even if they don't have the
> "refs/" directory or a "packed-refs" file. This would still require work
> in alternative implementations of Git, but this work will need to happen
> anyway when the reftable backend lands.
>
> I'd personally love for this extension to be introduced before I'm
> sending the reftable backend upstream so that we can have discussions
> around it beforehand.
We already have an extension config option to specify that we're using
reftable, don't we? But anything in config has the same chicken-and-egg
problems as above, I think.
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-11-29 21:30 ` Taylor Blau
@ 2023-12-06 21:08 ` Jeff King
2023-12-07 8:20 ` Adam Majer
2023-12-08 21:09 ` Taylor Blau
0 siblings, 2 replies; 23+ messages in thread
From: Jeff King @ 2023-12-06 21:08 UTC (permalink / raw)
To: Taylor Blau; +Cc: Adam Majer, git
On Wed, Nov 29, 2023 at 04:30:46PM -0500, Taylor Blau wrote:
> On Tue, Nov 28, 2023 at 02:04:46PM -0500, Jeff King wrote:
> > - whatever is consuming the embedded repos could "mkdir -p refs
> > objects" as needed. This is a minor pain, but I think in the long
> > term we are moving to a world where you have to explicitly do
> > "GIT_DIR=$PWD/embedded.git" to access an embedded bare repo. So
> > they're already special and require some setup; adding an extra step
> > may not be so bad.
>
> I hope not. I suppose that using embedded bare repositories in a test
> requires additional setup at least to "cd" into the directory (if they
> are not using `$GIT_DIR` or `--git-dir` already). But I fear that
> imposing even a small change like this is too tall an order for how many
> millions of these exist in the wild across all sorts of projects.
I dunno. I am skeptical that there are millions of these. Who really
wants to embed bare git repos except for projects related to Git itself,
which want test vectors? Is there a use case I'm missing?
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-12-06 20:10 ` Jeff King
@ 2023-12-07 7:01 ` Patrick Steinhardt
2023-12-07 7:34 ` Jeff King
0 siblings, 1 reply; 23+ messages in thread
From: Patrick Steinhardt @ 2023-12-07 7:01 UTC (permalink / raw)
To: Jeff King; +Cc: Adam Majer, git
[-- Attachment #1: Type: text/plain, Size: 5150 bytes --]
On Wed, Dec 06, 2023 at 03:10:48PM -0500, Jeff King wrote:
> On Wed, Nov 29, 2023 at 11:13:18AM +0100, Patrick Steinhardt wrote:
>
> > As I'm currently working on the reftable backend this thought has also
> > crossed my mind. The reftable backend doesn't only create "refs/", but
> > it also creates "HEAD" with contents "ref: refs/heads/.invalid" so that
> > Git commands recognize the Git directory properly. Longer-term I would
> > really love to see us doing a better job of detecting Git repositories
> > so that we don't have to carry this legacy baggage around.
> >
> > I can see different ways for how to do this:
> >
> > - Either we iterate through all known reference backends, asking
> > each of them whether they recognize the directory as something
> > they understand.
> >
> > - Or we start parsing the gitconfig of the repository so that we can
> > learn about which reference backend to expect, and then ask that
> > specific backend whether it thinks that the directory indeed looks
> > like something it can handle.
> >
> > I'd personally prefer the latter, but I'm not sure whether we really
> > want to try and parse any file that happens to be called "config".
>
> We do eventually parse the config file to pick up repositoryFormatVersion.
> But there's sort of a chicken-and-egg here where we only do so after
> gaining some confidence that it's a repo directory. :)
>
> I actually think the "ask each backend if it looks plausible" is
> reasonable, at least for an implementation that knows about all
> backends. Though what gives me pause is how older versions of Git will
> behave with a new-format repository that does not have a "refs"
> directory.
>
> There are really two compatibility checks. In is_git_directory(), we
> want to say "is this a repo or not". And then later we parse the config,
> make sure the repository format is OK, and that we support all
> extensions. So right now, an older version of Git that encounters a
> reftable-formatted repo (that has a vestigial "refs/" directory) says
> "ah, that is a repo, but I don't understand it" (the latter because
> presumably the repo version/extensions in .git/config are values it
> doesn't know about). But if we get rid of "refs/", then older versions
> of Git will stop even considering it as a repo at all, and will keep
> searching up to the ceiling directory. So either:
>
> 1. They'll find nothing, and you'll get "you're not in a git repo",
> rather than "you're in a git repo, but I don't understand it".
> Which is slightly worse.
>
> 2. They'll find some _other_ containing repo. Which could be quite
> confusing.
>
> So forgetting at all about how we structure the code, it seems to me
> that the problem is not new code, but all of the existing code which
> looks for access("refs", X_OK).
True. The question is of course how much value there is in an old tool
to be able to discover a new repository that it wouldn't be able to read
in the first place due to it not understanding the reference format. So
I'd very much like to see that eventually, we're able to get rid of
"legacy" cruft that doesn't serve any purpose anymore.
The question is whether we can do a better job of this going forward so
that at least we don't have to pose the same question in the future.
Right now, we'll face the same problem whenever any part of the current
on-disk repository data structures changes.
I wonder whether it would make sense to introduce something like a
filesystem-level hint, e.g. in the form of a new ".is-git-repository"
file. If Git discovers that file then it assumes the directory to be a
Git repository -- and everything else is set up by parsing the config
and thus the repository's configured format.
> I dunno. Maybe that is being too paranoid about backwards compatibility.
> People will have to turn on reftable manually, at least for a while, and
> would hopefully know what they are signing up for, and that old versions
> might not work as well. And by the time a new format becomes the
> default, it's possible that those older versions would have become quite
> rare.
>
> > Just throwing this out there, but we could use this as an excuse to
> > introduce "extensions.refFormat". If it's explicitly configured to be
> > "reffiles" then we accept repositories even if they don't have the
> > "refs/" directory or a "packed-refs" file. This would still require work
> > in alternative implementations of Git, but this work will need to happen
> > anyway when the reftable backend lands.
> >
> > I'd personally love for this extension to be introduced before I'm
> > sending the reftable backend upstream so that we can have discussions
> > around it beforehand.
>
> We already have an extension config option to specify that we're using
> reftable, don't we? But anything in config has the same chicken-and-egg
> problems as above, I think.
Not yet, no. I plan to submit the new "extensions.refFormat" extension
soonish though, probably next week.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-12-07 7:01 ` Patrick Steinhardt
@ 2023-12-07 7:34 ` Jeff King
2023-12-07 8:37 ` Patrick Steinhardt
0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2023-12-07 7:34 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Adam Majer, git
On Thu, Dec 07, 2023 at 08:01:41AM +0100, Patrick Steinhardt wrote:
> > So forgetting at all about how we structure the code, it seems to me
> > that the problem is not new code, but all of the existing code which
> > looks for access("refs", X_OK).
>
> True. The question is of course how much value there is in an old tool
> to be able to discover a new repository that it wouldn't be able to read
> in the first place due to it not understanding the reference format. So
> I'd very much like to see that eventually, we're able to get rid of
> "legacy" cruft that doesn't serve any purpose anymore.
Right, nobody is going to be mad about not being able to use the
repository with old code. My concern is that by skipping it in the
discovery phase, though, the user may see unexpected behavior (like
continuing and finding some _other_ repo). I admit it's a pretty narrow
case, though.
> The question is whether we can do a better job of this going forward so
> that at least we don't have to pose the same question in the future.
> Right now, we'll face the same problem whenever any part of the current
> on-disk repository data structures changes.
>
> I wonder whether it would make sense to introduce something like a
> filesystem-level hint, e.g. in the form of a new ".is-git-repository"
> file. If Git discovers that file then it assumes the directory to be a
> Git repository -- and everything else is set up by parsing the config
> and thus the repository's configured format.
I kind of think the presence of a well-formed HEAD is a good indicator
that it is a Git directory. IOW, I don't have any real problem with
simply loosening is_git_directory() to remove the "refs/" check
entirely. But again, that can only help us going forward; older versions
will still get confused if we truly ditch "refs/" for other formats.
Of course some ref formats may want to avoid the "HEAD" file itself, so
your .is-git-repository would be cleaner. I'm just not sure if it's
worth the headache to try to switch things now.
> > We already have an extension config option to specify that we're using
> > reftable, don't we? But anything in config has the same chicken-and-egg
> > problems as above, I think.
>
> Not yet, no. I plan to submit the new "extensions.refFormat" extension
> soonish though, probably next week.
Ah, OK. I remember talking about it with Han-Wen long ago, but I admit I
have not paid much attention to reftable work recently. :) So I am happy
you are picking it up.
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-12-06 21:08 ` Jeff King
@ 2023-12-07 8:20 ` Adam Majer
2023-12-08 21:09 ` Taylor Blau
1 sibling, 0 replies; 23+ messages in thread
From: Adam Majer @ 2023-12-07 8:20 UTC (permalink / raw)
To: Jeff King, Taylor Blau; +Cc: git
On 12/6/23 22:08, Jeff King wrote:
> On Wed, Nov 29, 2023 at 04:30:46PM -0500, Taylor Blau wrote:
>
>> On Tue, Nov 28, 2023 at 02:04:46PM -0500, Jeff King wrote:
>>> - whatever is consuming the embedded repos could "mkdir -p refs
>>> objects" as needed. This is a minor pain, but I think in the long
>>> term we are moving to a world where you have to explicitly do
>>> "GIT_DIR=$PWD/embedded.git" to access an embedded bare repo. So
>>> they're already special and require some setup; adding an extra step
>>> may not be so bad.
>>
>> I hope not. I suppose that using embedded bare repositories in a test
>> requires additional setup at least to "cd" into the directory (if they
>> are not using `$GIT_DIR` or `--git-dir` already). But I fear that
>> imposing even a small change like this is too tall an order for how many
>> millions of these exist in the wild across all sorts of projects.
>
> I dunno. I am skeptical that there are millions of these. Who really
> wants to embed bare git repos except for projects related to Git itself,
> which want test vectors? Is there a use case I'm missing?
Well, it's an "easy" thing to do, instead of recreating these test cases
from sources like it's done here. It seems this is what happens in
projects like Gitea.
As to the original questions you've raised earlier in the thread, I
thought about it, and I don't really have a compelling reason to try to
force this patch into Git. At least, I do not feel it necessary to try
to argue the points you've raised. If that means the patch is ignored,
I'm ok with that.
The reasons I put it here is simply I found that it fixes an issue I
came across and that "everything else" worked. I don't know the
intricacies of current or future git plans and I would rather delegate
such discussion to the experts.
Best regards,
Adam
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-12-07 7:34 ` Jeff King
@ 2023-12-07 8:37 ` Patrick Steinhardt
0 siblings, 0 replies; 23+ messages in thread
From: Patrick Steinhardt @ 2023-12-07 8:37 UTC (permalink / raw)
To: Jeff King; +Cc: Adam Majer, git
[-- Attachment #1: Type: text/plain, Size: 3353 bytes --]
On Thu, Dec 07, 2023 at 02:34:51AM -0500, Jeff King wrote:
> On Thu, Dec 07, 2023 at 08:01:41AM +0100, Patrick Steinhardt wrote:
>
> > > So forgetting at all about how we structure the code, it seems to me
> > > that the problem is not new code, but all of the existing code which
> > > looks for access("refs", X_OK).
> >
> > True. The question is of course how much value there is in an old tool
> > to be able to discover a new repository that it wouldn't be able to read
> > in the first place due to it not understanding the reference format. So
> > I'd very much like to see that eventually, we're able to get rid of
> > "legacy" cruft that doesn't serve any purpose anymore.
>
> Right, nobody is going to be mad about not being able to use the
> repository with old code. My concern is that by skipping it in the
> discovery phase, though, the user may see unexpected behavior (like
> continuing and finding some _other_ repo). I admit it's a pretty narrow
> case, though.
Agreed, that's also an angle I brought up in a separate thread [1]. The
second benefit is that the user would get a proper error message stating
that the "extensions.refFormat" is not understood compared to Git just
skipping over the repository completely.
> > The question is whether we can do a better job of this going forward so
> > that at least we don't have to pose the same question in the future.
> > Right now, we'll face the same problem whenever any part of the current
> > on-disk repository data structures changes.
> >
> > I wonder whether it would make sense to introduce something like a
> > filesystem-level hint, e.g. in the form of a new ".is-git-repository"
> > file. If Git discovers that file then it assumes the directory to be a
> > Git repository -- and everything else is set up by parsing the config
> > and thus the repository's configured format.
>
> I kind of think the presence of a well-formed HEAD is a good indicator
> that it is a Git directory. IOW, I don't have any real problem with
> simply loosening is_git_directory() to remove the "refs/" check
> entirely. But again, that can only help us going forward; older versions
> will still get confused if we truly ditch "refs/" for other formats.
>
> Of course some ref formats may want to avoid the "HEAD" file itself, so
> your .is-git-repository would be cleaner. I'm just not sure if it's
> worth the headache to try to switch things now.
I think that both "HEAD" and "refs/" are in the same spirit and consider
both to be legacy cruft that ideally wouldn't exist with the reftable
backend. I think dropping just one of these requirements ("refs/") is
the least favorable option though:
- We'd still have unneeded files that only exist to aid old clients.
- At the same time, the old clients wouldn't be able to detect the
repository anymore and need an update. So we could just as well drop
both files and would have the same outcome.
- This is not a long-term solution in case anything else in the
on-disk format will ever change.
Whether it's worth getting rid of them now... probably not, at least not
for the time being. But if we want to address this issue I'd rather want
to aim for a proper solution that also works for future changes.
Patrick
[1]: <ZXFy0_T1AZLh058g@tanuki>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-11-28 19:04 ` Jeff King
2023-11-29 10:13 ` Patrick Steinhardt
2023-11-29 21:30 ` Taylor Blau
@ 2023-12-08 18:17 ` Junio C Hamano
2 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2023-12-08 18:17 UTC (permalink / raw)
To: Jeff King; +Cc: Adam Majer, git
Jeff King <peff@peff.net> writes:
> So with regards to the loosening in your patch, my questions would be:
>
> - if we are going to change the rules for repository detection, is
> this where we want to end up? We haven't changed them (yet) for
> reftables. If we are going to do so, should we have a scheme that
> will work for that transition, too? The "refs is an empty file"
> scheme would fix your use case, too (though see below).
>
> - is the rest of Git ready to handle a missing "refs/" directory? It
> looks like making a ref will auto-create it (since we may have to
> make refs/foo/bar/... anyway).
>
> - what about other implementations? Your embedded repos will
> presumably not work with libgit2, jgit, etc, until they also get
> similar patches.
>
> - what about empty repositories? In that case there will be no "refs/"
> file and no "packed-refs" file (such a repository is less likely, of
> course, but it may contain objects but no refs, or the point may be
> to have an empty repo as a test vector). Likewise, it is possible
> for a repository to have an empty "objects" directory (even with a
> non-empty refs directory, if there are only symrefs), and your patch
> doesn't address that.
All good points.
> Getting back to your use case, I'd suggest one of:
>
> - do the usual "touch refs/.gitignore" trick to explicitly track the
> empty directory. It looks like the ref code will ignore this (we
> don't allow ref names to start with "." in a path component)
>
> - whatever is consuming the embedded repos could "mkdir -p refs
> objects" as needed. This is a minor pain, but I think in the long
> term we are moving to a world where you have to explicitly do
> "GIT_DIR=$PWD/embedded.git" to access an embedded bare repo. So
> they're already special and require some setup; adding an extra step
> may not be so bad.
Yeah, it truly is caused by the combination of the fact that we do
not "track" empty directories and that skeleton Git repository
structure does rely on possibly empty directories. The above two
are reasonable workarounds when you are dealing with any medium that
does not allow empty directories, not just working tree managed by
Git.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-12-06 21:08 ` Jeff King
2023-12-07 8:20 ` Adam Majer
@ 2023-12-08 21:09 ` Taylor Blau
2023-12-12 1:22 ` Jeff King
1 sibling, 1 reply; 23+ messages in thread
From: Taylor Blau @ 2023-12-08 21:09 UTC (permalink / raw)
To: Jeff King; +Cc: Adam Majer, git
On Wed, Dec 06, 2023 at 04:08:36PM -0500, Jeff King wrote:
> On Wed, Nov 29, 2023 at 04:30:46PM -0500, Taylor Blau wrote:
>
> > On Tue, Nov 28, 2023 at 02:04:46PM -0500, Jeff King wrote:
> > > - whatever is consuming the embedded repos could "mkdir -p refs
> > > objects" as needed. This is a minor pain, but I think in the long
> > > term we are moving to a world where you have to explicitly do
> > > "GIT_DIR=$PWD/embedded.git" to access an embedded bare repo. So
> > > they're already special and require some setup; adding an extra step
> > > may not be so bad.
> >
> > I hope not. I suppose that using embedded bare repositories in a test
> > requires additional setup at least to "cd" into the directory (if they
> > are not using `$GIT_DIR` or `--git-dir` already). But I fear that
> > imposing even a small change like this is too tall an order for how many
> > millions of these exist in the wild across all sorts of projects.
>
> I dunno. I am skeptical that there are millions of these. Who really
> wants to embed bare git repos except for projects related to Git itself,
> which want test vectors? Is there a use case I'm missing?
Just picking on GitHub as an example, my copy has a fair number of
embedded bare repositories:
$ find . -mindepth 2 -type d -name '*.git' | wc -l
279
That might be an unfair example in general, since GitHub probably has a
greater need to embed bare repositories than most other projects. But I
think that we shouldn't make our decision here based on volume of
embedded bare repositories, but rather on the number of projects which
have >1 embedded bare repository.
In other words, the cost of migrating a single project's embedded bare
repositories is roughly the same whether there are 1 or 279 of them. So
the effort scales with the number of projects, not repositories.
Perhaps I'm over-estimating how difficult this transition would be to
impose on users. But it does make me very leery to make this kind of a
change without having a better sense of how many of them exist in the
wild.
Searching just on GitHub for `path:**/*.git/config` [^1], it looks like
there are ~1,400 results. That provides us an upper-bound on the number
of projects which have embedded bare repositories, so perhaps I really
am overestimating the burden we'd be imposing on other projects.
I dunno :-).
Thanks,
Taylor
[^1]: Searching for "path:**/*.git" doesn't quite work, since GitHub's
search doesn't match directories here. So the search I actually used
isn't perfect, but it should give us a rough approximation.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] setup: recognize bare repositories with packed-refs
2023-12-08 21:09 ` Taylor Blau
@ 2023-12-12 1:22 ` Jeff King
0 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2023-12-12 1:22 UTC (permalink / raw)
To: Taylor Blau; +Cc: Adam Majer, git
On Fri, Dec 08, 2023 at 04:09:03PM -0500, Taylor Blau wrote:
> > I dunno. I am skeptical that there are millions of these. Who really
> > wants to embed bare git repos except for projects related to Git itself,
> > which want test vectors? Is there a use case I'm missing?
>
> Just picking on GitHub as an example, my copy has a fair number of
> embedded bare repositories:
>
> $ find . -mindepth 2 -type d -name '*.git' | wc -l
> 279
>
> That might be an unfair example in general, since GitHub probably has a
> greater need to embed bare repositories than most other projects. But I
> think that we shouldn't make our decision here based on volume of
> embedded bare repositories, but rather on the number of projects which
> have >1 embedded bare repository.
Right, I meant "I am skeptical there are a lot of projects that have
embedded repositories". It is useful if your project is related to
working on Git itself and you store your test vectors that way. So
github.git is not alone there (there is libgit2, other forges, and so
on). But I don't think it is representative in general.
> Perhaps I'm over-estimating how difficult this transition would be to
> impose on users. But it does make me very leery to make this kind of a
> change without having a better sense of how many of them exist in the
> wild.
Just to be clear: I am not proposing any transition here. It is already
the case that your "refs/" directory is necessary for Git to recognize
the bare repo, and you risk committing a broken state if you have no
loose refs in it.
There's been a proposal elsewhere to require extra steps to recognize an
embedded bare repo. Which I agree will be a pain for folks who use them,
but may be worth it for the security benefit. But here I was only saying
that _if_ we do that other change, then adding extra steps might not be
too bad on top. :)
(BTW, I think libgit2 already faces this problem, because it wants
non-bare repos; so there is some magic where it stores ".gitted"
directories, and then renames them on the fly).
> Searching just on GitHub for `path:**/*.git/config` [^1], it looks like
> there are ~1,400 results. That provides us an upper-bound on the number
> of projects which have embedded bare repositories, so perhaps I really
> am overestimating the burden we'd be imposing on other projects.
Thanks, that's an interesting number, and matches my intuition.
Of course it's not a true upper bound anyway. It wouldn't count private
projects (though maybe it hits github.git in your case), not to mention
stuff that isn't hosted on GitHub.
-Peff
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2023-12-12 1:22 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-17 20:25 [PATCH] setup: recognize bare repositories with packed-refs Adam Majer
2023-11-17 20:32 ` Adam Majer
2023-11-17 20:44 ` Adam Majer
2023-11-19 23:24 ` Junio C Hamano
2023-11-20 9:31 ` Glen Choo
2023-11-27 19:42 ` Josh Steadmon
2023-11-20 9:43 ` Adam Majer
2023-11-27 19:44 ` Josh Steadmon
2023-11-28 14:14 ` Adam Majer
2023-11-28 14:28 ` Adam Majer
2023-11-28 18:45 ` Josh Steadmon
2023-11-28 19:04 ` Jeff King
2023-11-29 10:13 ` Patrick Steinhardt
2023-12-06 20:10 ` Jeff King
2023-12-07 7:01 ` Patrick Steinhardt
2023-12-07 7:34 ` Jeff King
2023-12-07 8:37 ` Patrick Steinhardt
2023-11-29 21:30 ` Taylor Blau
2023-12-06 21:08 ` Jeff King
2023-12-07 8:20 ` Adam Majer
2023-12-08 21:09 ` Taylor Blau
2023-12-12 1:22 ` Jeff King
2023-12-08 18:17 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).