* [PATCH] clone: support cloning of filtered bundles
@ 2024-01-14 11:16 Nikolay Edigaryev via GitGitGadget
2024-01-14 18:00 ` Phillip Wood
2024-01-15 1:13 ` Junio C Hamano
0 siblings, 2 replies; 9+ messages in thread
From: Nikolay Edigaryev via GitGitGadget @ 2024-01-14 11:16 UTC (permalink / raw)
To: git; +Cc: Derrick Stolee, Nikolay Edigaryev, Nikolay Edigaryev
From: Nikolay Edigaryev <edigaryev@gmail.com>
f18b512bbb (bundle: create filtered bundles, 2022-03-09) introduced an
incredibly useful ability to create filtered bundles, which advances
the partial clone/promisor support in Git and allows for archiving
large repositories to object storages like S3 in bundles that are:
* easy to manage
* bundle is just a single file, it's easier to guarantee atomic
replacements in object storages like S3 and they are faster to
fetch than a bare repository since there's only a single GET
request involved
* incredibly tiny
* no indexes (which may be more than 10 MB for some repositories)
and other fluff, compared to cloning a bare repository
* bundle can be filtered to only contain the tips of refs neccessary
for e.g. code-analysis purposes
However, in 86fdd94d72 (clone: fail gracefully when cloning filtered
bundle, 2022-03-09) the cloning of such bundles was disabled, with a
note that this behavior is not desired, and it the long-term this
should be possible.
The commit above states that it's not possible to have this at the
moment due to lack of remote and a repository-global config that
specifies an object filter, yet it's unclear why a remote-specific
config can't be used instead, which is what this change does.
Signed-off-by: Nikolay Edigaryev <edigaryev@gmail.com>
---
clone: support cloning of filtered bundles
f18b512bbb (bundle: create filtered bundles, 2022-03-09) introduced an
incredibly useful ability to create filtered bundles, which advances the
partial clone/promisor support in Git and allows for archiving large
repositories to object storages like S3 in bundles that are:
* easy to manage
* bundle is just a single file, it's easier to guarantee atomic
replacements in object storages like S3 and they are faster to
fetch than a bare repository since there's only a single GET
request involved
* incredibly tiny
* no indexes (which may be more than 10 MB for some repositories) and
other fluff, compared to cloning a bare repository
* bundle can be filtered to only contain the tips of refs neccessary
for e.g. code-analysis purposes
However, in 86fdd94d72 (clone: fail gracefully when cloning filtered
bundle, 2022-03-09) the cloning of such bundles was disabled, with a
note that this behavior is not desired, and it the long-term this should
be possible.
The commit above states that it's not possible to have this at the
moment due to lack of remote and a repository-global config that
specifies an object filter, yet it's unclear why a remote-specific
config can't be used instead, which is what this change does.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1644%2Fedigaryev%2Fclone-filtered-bundles-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1644/edigaryev/clone-filtered-bundles-v1
Pull-Request: https://github.com/git/git/pull/1644
builtin/clone.c | 13 +++++++++++--
t/t6020-bundle-misc.sh | 13 +++----------
2 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/builtin/clone.c b/builtin/clone.c
index c6357af9498..4b3fedf78ed 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1227,9 +1227,18 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
if (fd > 0)
close(fd);
+
+ if (has_filter) {
+ strbuf_addf(&key, "remote.%s.promisor", remote_name);
+ git_config_set(key.buf, "true");
+ strbuf_reset(&key);
+
+ strbuf_addf(&key, "remote.%s.partialclonefilter", remote_name);
+ git_config_set(key.buf, expand_list_objects_filter_spec(&header.filter));
+ strbuf_reset(&key);
+ }
+
bundle_header_release(&header);
- if (has_filter)
- die(_("cannot clone from filtered bundle"));
}
transport_set_option(transport, TRANS_OPT_KEEP, "yes");
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 3e6bcbf30cd..f449df00642 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -555,16 +555,9 @@ do
'
done
-# NEEDSWORK: 'git clone --bare' should be able to clone from a filtered
-# bundle, but that requires a change to promisor/filter config options.
-# For now, we fail gracefully with a helpful error. This behavior can be
-# changed in the future to succeed as much as possible.
-test_expect_success 'cloning from filtered bundle has useful error' '
- git bundle create partial.bdl \
- --all \
- --filter=blob:none &&
- test_must_fail git clone --bare partial.bdl partial 2>err &&
- grep "cannot clone from filtered bundle" err
+test_expect_success 'cloning from filtered bundle works' '
+ git bundle create partial.bdl --all --filter=blob:none &&
+ git clone --bare partial.bdl partial 2>err
'
test_expect_success 'verify catches unreachable, broken prerequisites' '
base-commit: 564d0252ca632e0264ed670534a51d18a689ef5d
--
gitgitgadget
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] clone: support cloning of filtered bundles
2024-01-14 11:16 [PATCH] clone: support cloning of filtered bundles Nikolay Edigaryev via GitGitGadget
@ 2024-01-14 18:00 ` Phillip Wood
2024-01-14 19:39 ` Nikolay Edigaryev
2024-01-15 1:13 ` Junio C Hamano
1 sibling, 1 reply; 9+ messages in thread
From: Phillip Wood @ 2024-01-14 18:00 UTC (permalink / raw)
To: Nikolay Edigaryev via GitGitGadget, git; +Cc: Derrick Stolee, Nikolay Edigaryev
Hi Nikolay
On 14/01/2024 11:16, Nikolay Edigaryev via GitGitGadget wrote:
> From: Nikolay Edigaryev <edigaryev@gmail.com>
>
> f18b512bbb (bundle: create filtered bundles, 2022-03-09) introduced an
> incredibly useful ability to create filtered bundles, which advances
> the partial clone/promisor support in Git and allows for archiving
> large repositories to object storages like S3 in bundles that are:
>
> * easy to manage
> * bundle is just a single file, it's easier to guarantee atomic
> replacements in object storages like S3 and they are faster to
> fetch than a bare repository since there's only a single GET
> request involved
> * incredibly tiny
> * no indexes (which may be more than 10 MB for some repositories)
> and other fluff, compared to cloning a bare repository
> * bundle can be filtered to only contain the tips of refs neccessary
> for e.g. code-analysis purposes
>
> However, in 86fdd94d72 (clone: fail gracefully when cloning filtered
> bundle, 2022-03-09) the cloning of such bundles was disabled, with a
> note that this behavior is not desired, and it the long-term this
> should be possible.
>
> The commit above states that it's not possible to have this at the
> moment due to lack of remote and a repository-global config that
> specifies an object filter, yet it's unclear why a remote-specific
> config can't be used instead, which is what this change does.
As I understand it if you're cloning from a bundle file then then there
is no remote so how can we set a remote-specific config?
I'm surprised that the proposed change does not require the user to pass
"--filter" to "git clone" as I expected that we'd want to check that the
filter on the command line was compatible with the filter used to create
the bundle. Allowing "git clone" to create a partial clone without the
user asking for it by passing the "--filter" option feels like is going
to end up confusing users.
> +test_expect_success 'cloning from filtered bundle works' '
> + git bundle create partial.bdl --all --filter=blob:none &&
> + git clone --bare partial.bdl partial 2>err
The redirection hides any error message which will make debugging test
failures harder. It would be nice to see this test check any config set
when cloning and that git commands can run successfully in the repository.
Best Wishes
Phillip
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] clone: support cloning of filtered bundles
2024-01-14 18:00 ` Phillip Wood
@ 2024-01-14 19:39 ` Nikolay Edigaryev
2024-01-14 21:26 ` Nikolay Edigaryev
2024-01-15 10:18 ` phillip.wood123
0 siblings, 2 replies; 9+ messages in thread
From: Nikolay Edigaryev @ 2024-01-14 19:39 UTC (permalink / raw)
To: phillip.wood; +Cc: Nikolay Edigaryev via GitGitGadget, git, Derrick Stolee
Hello Phillip,
> As I understand it if you're cloning from a bundle file then then there
> is no remote so how can we set a remote-specific config?
There is a remote, for more details see df61c88979 (clone: also
configure url for bare clones, 2010-03-29), which has the following
code:
strbuf_addf(&key, "remote.%s.url", remote_name);
git_config_set(key.buf, repo);
strbuf_reset(&key);
You can verify this by creating a bundle on Git 2.43.0 with "git create
bundle bundle.bundle --all" and then cloning it with "git clone
--bare /path/to/bundle.bundle", in my case the following repo-wide
configuration file was created:
[core]
repositoryformatversion = 0
filemode = true
bare = true
ignorecase = true
precomposeunicode = true
[remote "origin"]
url = /Users/edi/src/cirrus-cli/cli.bundle
> I'm surprised that the proposed change does not require the user to pass
> "--filter" to "git clone" as I expected that we'd want to check that the
> filter on the command line was compatible with the filter used to create
> the bundle. Allowing "git clone" to create a partial clone without the
> user asking for it by passing the "--filter" option feels like is going
> to end up confusing users.
Note that currently, when you clone a normal non-filtered bundle with a
'--filter' argument specified, no filtering will take place and no error
will be thrown. "promisor = true" and "partialclonefilter = ..." options
will be set in the repo config, but no .promisor file will be created.
This is even more confusing IMO, but that's how it currently on
Git 2.43.0.
You have a good point, but I feel like completely preventing cloning of
filtered bundles and requiring a '--filter' argument is very taxing. If
you've already specified a '--filter' when creating a bundle (and thus
your intent to use partially cloned data), why do it multiple times?
What I propose as an alternative here is to act based on the user's
intent when cloning:
* when the user specifies no '--filter' argument, do nothing special,
allow cloning both types of bundles: normal and filtered (with the
logic from this patch)
* when the user does specify a '--filter' argument, either:
* throw an error explaining that filtering of filtered bundles is not
supported
* or compare the user's filter specification and the one that is
in the bundle and only throw an error if they mismatch
Let me know what you think about this (and perhaps you have a more
concrete example in mind where this will have negative consequences)
and I'll be happy to do a next iteration.
On Sun, Jan 14, 2024 at 10:00 PM Phillip Wood <phillip.wood123@gmail.com> wrote:
>
> Hi Nikolay
>
> On 14/01/2024 11:16, Nikolay Edigaryev via GitGitGadget wrote:
> > From: Nikolay Edigaryev <edigaryev@gmail.com>
> >
> > f18b512bbb (bundle: create filtered bundles, 2022-03-09) introduced an
> > incredibly useful ability to create filtered bundles, which advances
> > the partial clone/promisor support in Git and allows for archiving
> > large repositories to object storages like S3 in bundles that are:
> >
> > * easy to manage
> > * bundle is just a single file, it's easier to guarantee atomic
> > replacements in object storages like S3 and they are faster to
> > fetch than a bare repository since there's only a single GET
> > request involved
> > * incredibly tiny
> > * no indexes (which may be more than 10 MB for some repositories)
> > and other fluff, compared to cloning a bare repository
> > * bundle can be filtered to only contain the tips of refs neccessary
> > for e.g. code-analysis purposes
> >
> > However, in 86fdd94d72 (clone: fail gracefully when cloning filtered
> > bundle, 2022-03-09) the cloning of such bundles was disabled, with a
> > note that this behavior is not desired, and it the long-term this
> > should be possible.
> >
> > The commit above states that it's not possible to have this at the
> > moment due to lack of remote and a repository-global config that
> > specifies an object filter, yet it's unclear why a remote-specific
> > config can't be used instead, which is what this change does.
>
> As I understand it if you're cloning from a bundle file then then there
> is no remote so how can we set a remote-specific config?
>
> I'm surprised that the proposed change does not require the user to pass
> "--filter" to "git clone" as I expected that we'd want to check that the
> filter on the command line was compatible with the filter used to create
> the bundle. Allowing "git clone" to create a partial clone without the
> user asking for it by passing the "--filter" option feels like is going
> to end up confusing users.
>
> > +test_expect_success 'cloning from filtered bundle works' '
> > + git bundle create partial.bdl --all --filter=blob:none &&
> > + git clone --bare partial.bdl partial 2>err
>
> The redirection hides any error message which will make debugging test
> failures harder. It would be nice to see this test check any config set
> when cloning and that git commands can run successfully in the repository.
>
> Best Wishes
>
> Phillip
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] clone: support cloning of filtered bundles
2024-01-14 19:39 ` Nikolay Edigaryev
@ 2024-01-14 21:26 ` Nikolay Edigaryev
2024-01-15 10:35 ` phillip.wood123
2024-01-15 10:18 ` phillip.wood123
1 sibling, 1 reply; 9+ messages in thread
From: Nikolay Edigaryev @ 2024-01-14 21:26 UTC (permalink / raw)
To: phillip.wood; +Cc: Nikolay Edigaryev via GitGitGadget, git, Derrick Stolee
> Note that currently, when you clone a normal non-filtered bundle with a
> '--filter' argument specified, no filtering will take place and no error
> will be thrown. "promisor = true" and "partialclonefilter = ..." options
> will be set in the repo config, but no .promisor file will be created.
> This is even more confusing IMO, but that's how it currently on
> Git 2.43.0.
I was wrong about this one: the .promisor pack file actually gets created.
And the filtering is not being done because the "uploadpack.allowFilter"
global option is not enabled by default.
Unfortunately the only way to figure this out is to run Git with
GIT_TRACE=2, which shows:
warning: filtering not recognized by server, ignoring
So please feel free to skip this part from the consideration.
On Sun, Jan 14, 2024 at 11:39 PM Nikolay Edigaryev <edigaryev@gmail.com> wrote:
>
> Hello Phillip,
>
> > As I understand it if you're cloning from a bundle file then then there
> > is no remote so how can we set a remote-specific config?
>
> There is a remote, for more details see df61c88979 (clone: also
> configure url for bare clones, 2010-03-29), which has the following
> code:
>
> strbuf_addf(&key, "remote.%s.url", remote_name);
> git_config_set(key.buf, repo);
> strbuf_reset(&key);
>
> You can verify this by creating a bundle on Git 2.43.0 with "git create
> bundle bundle.bundle --all" and then cloning it with "git clone
> --bare /path/to/bundle.bundle", in my case the following repo-wide
> configuration file was created:
>
> [core]
> repositoryformatversion = 0
> filemode = true
> bare = true
> ignorecase = true
> precomposeunicode = true
> [remote "origin"]
> url = /Users/edi/src/cirrus-cli/cli.bundle
>
> > I'm surprised that the proposed change does not require the user to pass
> > "--filter" to "git clone" as I expected that we'd want to check that the
> > filter on the command line was compatible with the filter used to create
> > the bundle. Allowing "git clone" to create a partial clone without the
> > user asking for it by passing the "--filter" option feels like is going
> > to end up confusing users.
>
> Note that currently, when you clone a normal non-filtered bundle with a
> '--filter' argument specified, no filtering will take place and no error
> will be thrown. "promisor = true" and "partialclonefilter = ..." options
> will be set in the repo config, but no .promisor file will be created.
> This is even more confusing IMO, but that's how it currently on
> Git 2.43.0.
>
> You have a good point, but I feel like completely preventing cloning of
> filtered bundles and requiring a '--filter' argument is very taxing. If
> you've already specified a '--filter' when creating a bundle (and thus
> your intent to use partially cloned data), why do it multiple times?
>
> What I propose as an alternative here is to act based on the user's
> intent when cloning:
>
> * when the user specifies no '--filter' argument, do nothing special,
> allow cloning both types of bundles: normal and filtered (with the
> logic from this patch)
>
> * when the user does specify a '--filter' argument, either:
> * throw an error explaining that filtering of filtered bundles is not
> supported
> * or compare the user's filter specification and the one that is
> in the bundle and only throw an error if they mismatch
>
> Let me know what you think about this (and perhaps you have a more
> concrete example in mind where this will have negative consequences)
> and I'll be happy to do a next iteration.
>
>
> On Sun, Jan 14, 2024 at 10:00 PM Phillip Wood <phillip.wood123@gmail.com> wrote:
> >
> > Hi Nikolay
> >
> > On 14/01/2024 11:16, Nikolay Edigaryev via GitGitGadget wrote:
> > > From: Nikolay Edigaryev <edigaryev@gmail.com>
> > >
> > > f18b512bbb (bundle: create filtered bundles, 2022-03-09) introduced an
> > > incredibly useful ability to create filtered bundles, which advances
> > > the partial clone/promisor support in Git and allows for archiving
> > > large repositories to object storages like S3 in bundles that are:
> > >
> > > * easy to manage
> > > * bundle is just a single file, it's easier to guarantee atomic
> > > replacements in object storages like S3 and they are faster to
> > > fetch than a bare repository since there's only a single GET
> > > request involved
> > > * incredibly tiny
> > > * no indexes (which may be more than 10 MB for some repositories)
> > > and other fluff, compared to cloning a bare repository
> > > * bundle can be filtered to only contain the tips of refs neccessary
> > > for e.g. code-analysis purposes
> > >
> > > However, in 86fdd94d72 (clone: fail gracefully when cloning filtered
> > > bundle, 2022-03-09) the cloning of such bundles was disabled, with a
> > > note that this behavior is not desired, and it the long-term this
> > > should be possible.
> > >
> > > The commit above states that it's not possible to have this at the
> > > moment due to lack of remote and a repository-global config that
> > > specifies an object filter, yet it's unclear why a remote-specific
> > > config can't be used instead, which is what this change does.
> >
> > As I understand it if you're cloning from a bundle file then then there
> > is no remote so how can we set a remote-specific config?
> >
> > I'm surprised that the proposed change does not require the user to pass
> > "--filter" to "git clone" as I expected that we'd want to check that the
> > filter on the command line was compatible with the filter used to create
> > the bundle. Allowing "git clone" to create a partial clone without the
> > user asking for it by passing the "--filter" option feels like is going
> > to end up confusing users.
> >
> > > +test_expect_success 'cloning from filtered bundle works' '
> > > + git bundle create partial.bdl --all --filter=blob:none &&
> > > + git clone --bare partial.bdl partial 2>err
> >
> > The redirection hides any error message which will make debugging test
> > failures harder. It would be nice to see this test check any config set
> > when cloning and that git commands can run successfully in the repository.
> >
> > Best Wishes
> >
> > Phillip
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] clone: support cloning of filtered bundles
2024-01-14 11:16 [PATCH] clone: support cloning of filtered bundles Nikolay Edigaryev via GitGitGadget
2024-01-14 18:00 ` Phillip Wood
@ 2024-01-15 1:13 ` Junio C Hamano
2024-01-15 2:09 ` Junio C Hamano
1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2024-01-15 1:13 UTC (permalink / raw)
To: Nikolay Edigaryev via GitGitGadget; +Cc: git, Derrick Stolee, Nikolay Edigaryev
"Nikolay Edigaryev via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> diff --git a/builtin/clone.c b/builtin/clone.c
> index c6357af9498..4b3fedf78ed 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1227,9 +1227,18 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>
> if (fd > 0)
> close(fd);
> +
> + if (has_filter) {
> + strbuf_addf(&key, "remote.%s.promisor", remote_name);
> + git_config_set(key.buf, "true");
> + strbuf_reset(&key);
> +
> + strbuf_addf(&key, "remote.%s.partialclonefilter", remote_name);
> + git_config_set(key.buf, expand_list_objects_filter_spec(&header.filter));
> + strbuf_reset(&key);
> + }
> +
> -# NEEDSWORK: 'git clone --bare' should be able to clone from a filtered
> -# bundle, but that requires a change to promisor/filter config options.
Sorry, but this "should be able to" does not make sense to me in the
first place.
I can understand how an operation to create a narrow clone of an
unfiltered bundle and then prepare the resulting repository for
future "fattening" by naming the unfiltered bundle file a
"promisor", and allow the user to lazily fetch objects that have
initially been filtered out of the bundle lazily.
But a bundle that were created with objects _omitted_ already? It
obviously cannot "promise" to deliber any objects that ought to be
reachable from the objects contained in it, so setting the bundle
file as promisor in the configuration does not help the resulting
repository. Those missing objects must be obtained from somewhere
other than that original "filtered" bundle, and if that source of
objects that can promise all the objects that are ought to be
reachable were specified as the promisor, it would make sense. But
the source of this clone operation, i.e. the bundle file that is
pointed at by "remote.$remote_name.url", cannot be that promisor.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] clone: support cloning of filtered bundles
2024-01-15 1:13 ` Junio C Hamano
@ 2024-01-15 2:09 ` Junio C Hamano
2024-01-16 20:06 ` Nikolay Edigaryev
0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2024-01-15 2:09 UTC (permalink / raw)
To: Nikolay Edigaryev via GitGitGadget; +Cc: git, Derrick Stolee, Nikolay Edigaryev
Junio C Hamano <gitster@pobox.com> writes:
> "Nikolay Edigaryev via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
>> diff --git a/builtin/clone.c b/builtin/clone.c
>> index c6357af9498..4b3fedf78ed 100644
>> --- a/builtin/clone.c
>> +++ b/builtin/clone.c
>> @@ -1227,9 +1227,18 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>>
>> if (fd > 0)
>> close(fd);
>> +
>> + if (has_filter) {
>> + strbuf_addf(&key, "remote.%s.promisor", remote_name);
>> + git_config_set(key.buf, "true");
>> + strbuf_reset(&key);
>> +
>> + strbuf_addf(&key, "remote.%s.partialclonefilter", remote_name);
>> + git_config_set(key.buf, expand_list_objects_filter_spec(&header.filter));
>> + strbuf_reset(&key);
>> + }
>> +
>
>> -# NEEDSWORK: 'git clone --bare' should be able to clone from a filtered
>> -# bundle, but that requires a change to promisor/filter config options.
> ...
> But a bundle that were created with objects _omitted_ already?
> ... the source of this clone operation, i.e. the bundle file that is
> pointed at by "remote.$remote_name.url", cannot be that promisor.
Extending the above a bit, one important way a bundle is used is as
a medium for sneaker-net. Instead of making a full clone over the
network, if you can create a bundle that records all objects and all
refs out of the source repository and then unbundle it in a
different place to create a repository, you can tweak the resulting
repository by either adding a separete remote or changing the
remote.origin.url so that your subsequent fetch goes over the
network to the repository you took the initial bundle from.
The "tweak the resulting repository" part however MUST be done
manually with the current system. If we can optionally record the
publically reachable URL of the source repository when we create a
bundle file, and "git clone" on the receiving side can read the URL
out of the bundle and act on it (e.g., show it to the user and offer
to record it as remote.origin.url in the resulting repository---I do
not think it is wise to do this silently without letting the user
know from security's point of view), then the use of bundle files as
a medium for sneaker-netting will become even easier.
And once that is done, perhaps allowing a filtered bundle to act as
a sneaker-net medium to simulate an initial filtered clone would
make sense. The promisor as well as the origin will be the network
reachable URL and subsequent fetches (both deliberate ones via "git
fetch" as well as lazy on-demand ones that backfills missing objects
via the "promisor" access) would become possible.
But without such a change to the bundle file format, allowing
"clone" to finish and pretend the resulting repository is usable is
somewhat irresponsible to the users. The on-demand lazy fetch would
fail after this code cloned from such a filtered bundle, no?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] clone: support cloning of filtered bundles
2024-01-14 19:39 ` Nikolay Edigaryev
2024-01-14 21:26 ` Nikolay Edigaryev
@ 2024-01-15 10:18 ` phillip.wood123
1 sibling, 0 replies; 9+ messages in thread
From: phillip.wood123 @ 2024-01-15 10:18 UTC (permalink / raw)
To: Nikolay Edigaryev, phillip.wood
Cc: Nikolay Edigaryev via GitGitGadget, git, Derrick Stolee
Hi Nikolay
On 14/01/2024 19:39, Nikolay Edigaryev wrote:
> Hello Phillip,
>
>> As I understand it if you're cloning from a bundle file then then there
>> is no remote so how can we set a remote-specific config?
>
> There is a remote, for more details see df61c88979 (clone: also
> configure url for bare clones, 2010-03-29), which has the following
> code:
>
> strbuf_addf(&key, "remote.%s.url", remote_name);
> git_config_set(key.buf, repo);
> strbuf_reset(&key);
>
> You can verify this by creating a bundle on Git 2.43.0 with "git create
> bundle bundle.bundle --all" and then cloning it with "git clone
> --bare /path/to/bundle.bundle", in my case the following repo-wide
> configuration file was created:
>
> [core]
> repositoryformatversion = 0
> filemode = true
> bare = true
> ignorecase = true
> precomposeunicode = true
> [remote "origin"]
> url = /Users/edi/src/cirrus-cli/cli.bundle
Oh, thanks for clarifying that I didn't realize we set "origin" to point
to the bundle. That means this patch creates a promisor remote config
pointing to a bundle that does not contain the missing objects. As Junio
said that doesn't make much sense to me as the point of the promisor
config is to allow git to lazily fetch the missing objects.
Best Wishes
Phillip
>> I'm surprised that the proposed change does not require the user to pass
>> "--filter" to "git clone" as I expected that we'd want to check that the
>> filter on the command line was compatible with the filter used to create
>> the bundle. Allowing "git clone" to create a partial clone without the
>> user asking for it by passing the "--filter" option feels like is going
>> to end up confusing users.
>
> Note that currently, when you clone a normal non-filtered bundle with a
> '--filter' argument specified, no filtering will take place and no error
> will be thrown. "promisor = true" and "partialclonefilter = ..." options
> will be set in the repo config, but no .promisor file will be created.
> This is even more confusing IMO, but that's how it currently on
> Git 2.43.0.
>
> You have a good point, but I feel like completely preventing cloning of
> filtered bundles and requiring a '--filter' argument is very taxing. If
> you've already specified a '--filter' when creating a bundle (and thus
> your intent to use partially cloned data), why do it multiple times?
>
> What I propose as an alternative here is to act based on the user's
> intent when cloning:
>
> * when the user specifies no '--filter' argument, do nothing special,
> allow cloning both types of bundles: normal and filtered (with the
> logic from this patch)
>
> * when the user does specify a '--filter' argument, either:
> * throw an error explaining that filtering of filtered bundles is not
> supported
> * or compare the user's filter specification and the one that is
> in the bundle and only throw an error if they mismatch
>
> Let me know what you think about this (and perhaps you have a more
> concrete example in mind where this will have negative consequences)
> and I'll be happy to do a next iteration.
>
>
> On Sun, Jan 14, 2024 at 10:00 PM Phillip Wood <phillip.wood123@gmail.com> wrote:
>>
>> Hi Nikolay
>>
>> On 14/01/2024 11:16, Nikolay Edigaryev via GitGitGadget wrote:
>>> From: Nikolay Edigaryev <edigaryev@gmail.com>
>>>
>>> f18b512bbb (bundle: create filtered bundles, 2022-03-09) introduced an
>>> incredibly useful ability to create filtered bundles, which advances
>>> the partial clone/promisor support in Git and allows for archiving
>>> large repositories to object storages like S3 in bundles that are:
>>>
>>> * easy to manage
>>> * bundle is just a single file, it's easier to guarantee atomic
>>> replacements in object storages like S3 and they are faster to
>>> fetch than a bare repository since there's only a single GET
>>> request involved
>>> * incredibly tiny
>>> * no indexes (which may be more than 10 MB for some repositories)
>>> and other fluff, compared to cloning a bare repository
>>> * bundle can be filtered to only contain the tips of refs neccessary
>>> for e.g. code-analysis purposes
>>>
>>> However, in 86fdd94d72 (clone: fail gracefully when cloning filtered
>>> bundle, 2022-03-09) the cloning of such bundles was disabled, with a
>>> note that this behavior is not desired, and it the long-term this
>>> should be possible.
>>>
>>> The commit above states that it's not possible to have this at the
>>> moment due to lack of remote and a repository-global config that
>>> specifies an object filter, yet it's unclear why a remote-specific
>>> config can't be used instead, which is what this change does.
>>
>> As I understand it if you're cloning from a bundle file then then there
>> is no remote so how can we set a remote-specific config?
>>
>> I'm surprised that the proposed change does not require the user to pass
>> "--filter" to "git clone" as I expected that we'd want to check that the
>> filter on the command line was compatible with the filter used to create
>> the bundle. Allowing "git clone" to create a partial clone without the
>> user asking for it by passing the "--filter" option feels like is going
>> to end up confusing users.
>>
>>> +test_expect_success 'cloning from filtered bundle works' '
>>> + git bundle create partial.bdl --all --filter=blob:none &&
>>> + git clone --bare partial.bdl partial 2>err
>>
>> The redirection hides any error message which will make debugging test
>> failures harder. It would be nice to see this test check any config set
>> when cloning and that git commands can run successfully in the repository.
>>
>> Best Wishes
>>
>> Phillip
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] clone: support cloning of filtered bundles
2024-01-14 21:26 ` Nikolay Edigaryev
@ 2024-01-15 10:35 ` phillip.wood123
0 siblings, 0 replies; 9+ messages in thread
From: phillip.wood123 @ 2024-01-15 10:35 UTC (permalink / raw)
To: Nikolay Edigaryev, phillip.wood
Cc: Nikolay Edigaryev via GitGitGadget, git, Derrick Stolee
Hi Nikolay
On 14/01/2024 21:26, Nikolay Edigaryev wrote:
>> Note that currently, when you clone a normal non-filtered bundle with a
>> '--filter' argument specified, no filtering will take place and no error
>> will be thrown. "promisor = true" and "partialclonefilter = ..." options
>> will be set in the repo config, but no .promisor file will be created.
>> This is even more confusing IMO, but that's how it currently on
>> Git 2.43.0.
>
> I was wrong about this one: the .promisor pack file actually gets created.
>
> And the filtering is not being done because the "uploadpack.allowFilter"
> global option is not enabled by default.
That makes sense - if you try to make a partial clone from a remote that
does not support object filtering we fallback to a full clone and print
the warning you mention below.
> Unfortunately the only way to figure this out is to run Git with
> GIT_TRACE=2, which shows:
That seems strange, you should see the warning without having to set
GIT_TRACE. I've certainly seen the warning in the past when trying to
create a partial clone from a remote did not support them without me
setting any special environment variables.
Best Wishes
Phillip
> warning: filtering not recognized by server, ignoring
>
> So please feel free to skip this part from the consideration.
>
>
> On Sun, Jan 14, 2024 at 11:39 PM Nikolay Edigaryev <edigaryev@gmail.com> wrote:
>>
>> Hello Phillip,
>>
>>> As I understand it if you're cloning from a bundle file then then there
>>> is no remote so how can we set a remote-specific config?
>>
>> There is a remote, for more details see df61c88979 (clone: also
>> configure url for bare clones, 2010-03-29), which has the following
>> code:
>>
>> strbuf_addf(&key, "remote.%s.url", remote_name);
>> git_config_set(key.buf, repo);
>> strbuf_reset(&key);
>>
>> You can verify this by creating a bundle on Git 2.43.0 with "git create
>> bundle bundle.bundle --all" and then cloning it with "git clone
>> --bare /path/to/bundle.bundle", in my case the following repo-wide
>> configuration file was created:
>>
>> [core]
>> repositoryformatversion = 0
>> filemode = true
>> bare = true
>> ignorecase = true
>> precomposeunicode = true
>> [remote "origin"]
>> url = /Users/edi/src/cirrus-cli/cli.bundle
>>
>>> I'm surprised that the proposed change does not require the user to pass
>>> "--filter" to "git clone" as I expected that we'd want to check that the
>>> filter on the command line was compatible with the filter used to create
>>> the bundle. Allowing "git clone" to create a partial clone without the
>>> user asking for it by passing the "--filter" option feels like is going
>>> to end up confusing users.
>>
>> Note that currently, when you clone a normal non-filtered bundle with a
>> '--filter' argument specified, no filtering will take place and no error
>> will be thrown. "promisor = true" and "partialclonefilter = ..." options
>> will be set in the repo config, but no .promisor file will be created.
>> This is even more confusing IMO, but that's how it currently on
>> Git 2.43.0.
>>
>> You have a good point, but I feel like completely preventing cloning of
>> filtered bundles and requiring a '--filter' argument is very taxing. If
>> you've already specified a '--filter' when creating a bundle (and thus
>> your intent to use partially cloned data), why do it multiple times?
>>
>> What I propose as an alternative here is to act based on the user's
>> intent when cloning:
>>
>> * when the user specifies no '--filter' argument, do nothing special,
>> allow cloning both types of bundles: normal and filtered (with the
>> logic from this patch)
>>
>> * when the user does specify a '--filter' argument, either:
>> * throw an error explaining that filtering of filtered bundles is not
>> supported
>> * or compare the user's filter specification and the one that is
>> in the bundle and only throw an error if they mismatch
>>
>> Let me know what you think about this (and perhaps you have a more
>> concrete example in mind where this will have negative consequences)
>> and I'll be happy to do a next iteration.
>>
>>
>> On Sun, Jan 14, 2024 at 10:00 PM Phillip Wood <phillip.wood123@gmail.com> wrote:
>>>
>>> Hi Nikolay
>>>
>>> On 14/01/2024 11:16, Nikolay Edigaryev via GitGitGadget wrote:
>>>> From: Nikolay Edigaryev <edigaryev@gmail.com>
>>>>
>>>> f18b512bbb (bundle: create filtered bundles, 2022-03-09) introduced an
>>>> incredibly useful ability to create filtered bundles, which advances
>>>> the partial clone/promisor support in Git and allows for archiving
>>>> large repositories to object storages like S3 in bundles that are:
>>>>
>>>> * easy to manage
>>>> * bundle is just a single file, it's easier to guarantee atomic
>>>> replacements in object storages like S3 and they are faster to
>>>> fetch than a bare repository since there's only a single GET
>>>> request involved
>>>> * incredibly tiny
>>>> * no indexes (which may be more than 10 MB for some repositories)
>>>> and other fluff, compared to cloning a bare repository
>>>> * bundle can be filtered to only contain the tips of refs neccessary
>>>> for e.g. code-analysis purposes
>>>>
>>>> However, in 86fdd94d72 (clone: fail gracefully when cloning filtered
>>>> bundle, 2022-03-09) the cloning of such bundles was disabled, with a
>>>> note that this behavior is not desired, and it the long-term this
>>>> should be possible.
>>>>
>>>> The commit above states that it's not possible to have this at the
>>>> moment due to lack of remote and a repository-global config that
>>>> specifies an object filter, yet it's unclear why a remote-specific
>>>> config can't be used instead, which is what this change does.
>>>
>>> As I understand it if you're cloning from a bundle file then then there
>>> is no remote so how can we set a remote-specific config?
>>>
>>> I'm surprised that the proposed change does not require the user to pass
>>> "--filter" to "git clone" as I expected that we'd want to check that the
>>> filter on the command line was compatible with the filter used to create
>>> the bundle. Allowing "git clone" to create a partial clone without the
>>> user asking for it by passing the "--filter" option feels like is going
>>> to end up confusing users.
>>>
>>>> +test_expect_success 'cloning from filtered bundle works' '
>>>> + git bundle create partial.bdl --all --filter=blob:none &&
>>>> + git clone --bare partial.bdl partial 2>err
>>>
>>> The redirection hides any error message which will make debugging test
>>> failures harder. It would be nice to see this test check any config set
>>> when cloning and that git commands can run successfully in the repository.
>>>
>>> Best Wishes
>>>
>>> Phillip
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] clone: support cloning of filtered bundles
2024-01-15 2:09 ` Junio C Hamano
@ 2024-01-16 20:06 ` Nikolay Edigaryev
0 siblings, 0 replies; 9+ messages in thread
From: Nikolay Edigaryev @ 2024-01-16 20:06 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Nikolay Edigaryev via GitGitGadget, git, Derrick Stolee
Hello Junio and Phillip,
Thanks a lot for the explanations of how this is supposed to work. It
seems that to make this work properly, we'd need to:
(1) add an argument (or an option) to 'git bundle create', so that
the user will be able to explicitly request the inclusion of a
desired remote's URL
Without such mechanism in place data leak is possible, e.g. remote with
credentials hardcoded in it.
(2) extend the 'gitformat-bundle' to include 'url'
However, a remote can have multiple URLs and other remote-specific
options might be necessary to properly work with it.
(3) add an argument (or an option) to 'git clone', so that the user
will be able to explicitly request the write of the URL contained
in the bundle to the repository's config
Otherwise, it's insecure, e.g. someone might craft a bundle with a URL
that collects data from the user.
I don't want waste anyone's time on this anymore because I've toyed with
'git bundle' a bit more and realized that what I'm trying to accomplish
can be done the other way:
1. git init
2. git bundle unbundle <PATH> | <script that swaps hashes and refs in
'git bundle unbundle output' and feeds them to 'git update-ref'>
Hopefully this discussion will be useful for people looking to
accomplish something similar to what I've described in the initial
message.
On Mon, Jan 15, 2024 at 6:09 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > "Nikolay Edigaryev via GitGitGadget" <gitgitgadget@gmail.com>
> > writes:
> >
> >> diff --git a/builtin/clone.c b/builtin/clone.c
> >> index c6357af9498..4b3fedf78ed 100644
> >> --- a/builtin/clone.c
> >> +++ b/builtin/clone.c
> >> @@ -1227,9 +1227,18 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
> >>
> >> if (fd > 0)
> >> close(fd);
> >> +
> >> + if (has_filter) {
> >> + strbuf_addf(&key, "remote.%s.promisor", remote_name);
> >> + git_config_set(key.buf, "true");
> >> + strbuf_reset(&key);
> >> +
> >> + strbuf_addf(&key, "remote.%s.partialclonefilter", remote_name);
> >> + git_config_set(key.buf, expand_list_objects_filter_spec(&header.filter));
> >> + strbuf_reset(&key);
> >> + }
> >> +
> >
> >> -# NEEDSWORK: 'git clone --bare' should be able to clone from a filtered
> >> -# bundle, but that requires a change to promisor/filter config options.
> > ...
> > But a bundle that were created with objects _omitted_ already?
> > ... the source of this clone operation, i.e. the bundle file that is
> > pointed at by "remote.$remote_name.url", cannot be that promisor.
>
> Extending the above a bit, one important way a bundle is used is as
> a medium for sneaker-net. Instead of making a full clone over the
> network, if you can create a bundle that records all objects and all
> refs out of the source repository and then unbundle it in a
> different place to create a repository, you can tweak the resulting
> repository by either adding a separete remote or changing the
> remote.origin.url so that your subsequent fetch goes over the
> network to the repository you took the initial bundle from.
>
> The "tweak the resulting repository" part however MUST be done
> manually with the current system. If we can optionally record the
> publically reachable URL of the source repository when we create a
> bundle file, and "git clone" on the receiving side can read the URL
> out of the bundle and act on it (e.g., show it to the user and offer
> to record it as remote.origin.url in the resulting repository---I do
> not think it is wise to do this silently without letting the user
> know from security's point of view), then the use of bundle files as
> a medium for sneaker-netting will become even easier.
>
> And once that is done, perhaps allowing a filtered bundle to act as
> a sneaker-net medium to simulate an initial filtered clone would
> make sense. The promisor as well as the origin will be the network
> reachable URL and subsequent fetches (both deliberate ones via "git
> fetch" as well as lazy on-demand ones that backfills missing objects
> via the "promisor" access) would become possible.
>
> But without such a change to the bundle file format, allowing
> "clone" to finish and pretend the resulting repository is usable is
> somewhat irresponsible to the users. The on-demand lazy fetch would
> fail after this code cloned from such a filtered bundle, no?
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-01-16 20:06 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-14 11:16 [PATCH] clone: support cloning of filtered bundles Nikolay Edigaryev via GitGitGadget
2024-01-14 18:00 ` Phillip Wood
2024-01-14 19:39 ` Nikolay Edigaryev
2024-01-14 21:26 ` Nikolay Edigaryev
2024-01-15 10:35 ` phillip.wood123
2024-01-15 10:18 ` phillip.wood123
2024-01-15 1:13 ` Junio C Hamano
2024-01-15 2:09 ` Junio C Hamano
2024-01-16 20:06 ` Nikolay Edigaryev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).