* [PATCH] fetch: use bundle URIs when having creationToken heuristic
@ 2024-07-22 8:07 Toon Claes
2024-07-22 18:40 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Toon Claes @ 2024-07-22 8:07 UTC (permalink / raw)
To: git; +Cc: Toon Claes
At the moment, bundle URIs are only used on git-clone(1). For a clone
the use of bundle URI is trivial, because repository is empty so
downloading bundles will never result in downloading objects that are in
the repository already.
For git-fetch(1), this more complicated to use bundle URI. We want to
avoid downloading bundles that only contains objects that are in the
local repository already.
One way to achieve this is possible when the "creationToken" heuristic
is used for bundle URIs. With this heuristic, the server sends along a
creationToken for each bundle. When a client downloads bundles with such
creationToken, the highest known value is written to
`fetch.bundleCreationToken` in the git-config. The next time bundles are
advertised by the server, bundles with a lower creationToken value are
ignored. This was already implemented by 7903efb717 (bundle-uri:
download in creationToken order, 2023-01-31) in
fetch_bundles_by_token().
Using the creationToken heuristic is optional, but without it the
client has no idea which bundles are new and which only have objects the
client already has.
With this knowledge, make git-fetch(1) get bundle URI information from
the server, but only use bundle URIs if the creationToken heuristic is
used.
The code added in builtin/fetch.c is very similar to the code in
builtin/clone.c, so while at it make sure we always clean up the bundles
list advertised by the server.
Signed-off-by: Toon Claes <toon@iotcl.com>
---
builtin/clone.c | 13 +++++-----
builtin/fetch.c | 17 +++++++++++++
t/t5584-fetch-bundle-uri.sh | 49 +++++++++++++++++++++++++++++++++++++
3 files changed, 73 insertions(+), 6 deletions(-)
create mode 100755 t/t5584-fetch-bundle-uri.sh
diff --git a/builtin/clone.c b/builtin/clone.c
index af6017d41a..29f0470aea 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1406,9 +1406,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
git_config_set_gently("fetch.bundleuri", bundle_uri);
} else {
/*
- * Populate transport->got_remote_bundle_uri and
- * transport->bundle_uri. We might get nothing.
- */
+ * Populate transport->bundles. We might get nothing or fail
+ * trying, but clone can continue without bundles as well.
+ */
transport_get_remote_bundle_uri(transport);
if (transport->bundles &&
@@ -1419,10 +1419,11 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
else if (fetch_bundle_list(the_repository,
transport->bundles))
warning(_("failed to fetch advertised bundles"));
- } else {
- clear_bundle_list(transport->bundles);
- FREE_AND_NULL(transport->bundles);
}
+
+ clear_bundle_list(transport->bundles);
+ if (transport->bundles)
+ FREE_AND_NULL(transport->bundles);
}
if (refs)
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 693f02b958..1e944f81af 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1694,6 +1694,23 @@ static int do_fetch(struct transport *transport,
retcode = 1;
}
+ /*
+ * Populate transport->bundles. We might get nothing or fail
+ * trying, but fetch can continue without bundles as well.
+ */
+ transport_get_remote_bundle_uri(transport);
+
+ if (transport->bundles &&
+ hashmap_get_size(&transport->bundles->bundles) &&
+ (transport->bundles->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)) {
+ if (fetch_bundle_list(the_repository, transport->bundles))
+ warning(_("failed to fetch advertised bundles"));
+ }
+
+ clear_bundle_list(transport->bundles);
+ if (transport->bundles)
+ FREE_AND_NULL(transport->bundles);
+
if (fetch_and_consume_refs(&display_state, transport, transaction, ref_map,
&fetch_head, config)) {
retcode = 1;
diff --git a/t/t5584-fetch-bundle-uri.sh b/t/t5584-fetch-bundle-uri.sh
new file mode 100755
index 0000000000..6c2383646e
--- /dev/null
+++ b/t/t5584-fetch-bundle-uri.sh
@@ -0,0 +1,49 @@
+#!/bin/sh
+
+test_description='test use of bundle URI in "git fetch"'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+test_expect_success 'set up repos and bundles' '
+ git init source &&
+ test_commit -C source A &&
+ git clone --no-local source go-A-to-C &&
+ test_commit -C source B &&
+ git clone --no-local source go-B-to-C &&
+ git clone --no-local source go-B-to-D &&
+ git -C source bundle create B.bundle main &&
+ test_commit -C source C &&
+ git -C source bundle create B-to-C.bundle B..main &&
+ git -C source config uploadpack.advertiseBundleURIs true &&
+ git -C source config bundle.version 1 &&
+ git -C source config bundle.mode all &&
+ git -C source config bundle.heuristic creationToken &&
+ git -C source config bundle.bundle-B.uri "file://$(pwd)/source/B.bundle" &&
+ git -C source config bundle.bundle-B.creationToken 1 &&
+ git -C source config bundle.bundle-B-to-C.uri "file://$(pwd)/source/B-to-C.bundle" &&
+ git -C source config bundle.bundle-B-to-C.creationToken 2
+'
+
+test_expect_success 'fetches one bundle URI to get up-to-date' '
+ git -C go-B-to-C -c transfer.bundleURI=true fetch origin &&
+ test 1 = $(ls go-B-to-C/.git/objects/bundles | wc -l) &&
+ test 2 = $(git -C go-B-to-C config fetch.bundleCreationToken)
+'
+
+test_expect_success 'fetches two bundle URIs to get up-to-date' '
+ git -C go-A-to-C -c transfer.bundleURI=true fetch origin &&
+ test 2 = $(ls go-A-to-C/.git/objects/bundles | wc -l) &&
+ test 2 = $(git -C go-A-to-C config fetch.bundleCreationToken)
+'
+
+test_expect_success 'fetches one bundle URI and objects from remote' '
+ test_commit -C source D &&
+ git -C go-B-to-D -c transfer.bundleURI=true fetch origin &&
+ test 1 = $(ls go-B-to-D/.git/objects/bundles | wc -l) &&
+ test 2 = $(git -C go-B-to-D config fetch.bundleCreationToken)
+'
+
+test_done
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] fetch: use bundle URIs when having creationToken heuristic
2024-07-22 8:07 [PATCH] fetch: use bundle URIs when having creationToken heuristic Toon Claes
@ 2024-07-22 18:40 ` Junio C Hamano
2024-07-24 14:49 ` [PATCH v2 0/3] " Toon Claes
2024-09-27 9:04 ` [PATCH] builtin/clone: teach git-clone(1) the --ref= argument Toon Claes
2 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2024-07-22 18:40 UTC (permalink / raw)
To: Toon Claes; +Cc: git
Toon Claes <toon@iotcl.com> writes:
> At the moment, bundle URIs are only used on git-clone(1). For a clone
> the use of bundle URI is trivial, because repository is empty so
> downloading bundles will never result in downloading objects that are in
> the repository already.
>
> For git-fetch(1), this more complicated to use bundle URI. We want to
> avoid downloading bundles that only contains objects that are in the
> local repository already.
>
> One way to achieve this is possible when the "creationToken" heuristic
> is used for bundle URIs. With this heuristic, the server sends along a
> creationToken for each bundle. When a client downloads bundles with such
> creationToken, the highest known value is written to
> `fetch.bundleCreationToken` in the git-config. The next time bundles are
> advertised by the server, bundles with a lower creationToken value are
> ignored. This was already implemented by 7903efb717 (bundle-uri:
> download in creationToken order, 2023-01-31) in
> fetch_bundles_by_token().
>
> Using the creationToken heuristic is optional, but without it the
> client has no idea which bundles are new and which only have objects the
> client already has.
The above explains that you can see which bundle is newer than which
other ones, but ...
> With this knowledge, make git-fetch(1) get bundle URI information from
> the server, but only use bundle URIs if the creationToken heuristic is
> used.
... it does not explain how it helps for the repository that runs
"fetch" to tell which bundles do not give any new objects it does
not have. "clone" starts from an emptyness, so for it it is enough
if it can figure out among N bundles the other end has, which ones
are sufficient to cover the entire set of objects the other end has,
but "fetch" needs to be able to exclude a lot more---the objects
this end has need to be excluded". The logic to do so may need to
be explained a bit better than the above to make it more convincing.
On the other hand...
> The code added in builtin/fetch.c is very similar to the code in
> builtin/clone.c, so while at it make sure we always clean up the bundles
> list advertised by the server.
... I do understand that clean-up code. Please make it a separate
preliminary clean-up patch before the main thing.
If the "clone" and "fetch" codepath uses very similar logic, can the
code in "clone" the preliminary clean-up patch would fix be
refactored in another separate patch on top of the clean-up patch to
a helper function that can be called from outside? And then can
"fetch" be taught, on top of those two preliminary patches, to call
that helper function to implement what this single patch wanted to
do?
Thanks.
> Signed-off-by: Toon Claes <toon@iotcl.com>
> ---
> builtin/clone.c | 13 +++++-----
> builtin/fetch.c | 17 +++++++++++++
> t/t5584-fetch-bundle-uri.sh | 49 +++++++++++++++++++++++++++++++++++++
> 3 files changed, 73 insertions(+), 6 deletions(-)
> create mode 100755 t/t5584-fetch-bundle-uri.sh
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index af6017d41a..29f0470aea 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1406,9 +1406,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
> git_config_set_gently("fetch.bundleuri", bundle_uri);
> } else {
> /*
> - * Populate transport->got_remote_bundle_uri and
> - * transport->bundle_uri. We might get nothing.
> - */
> + * Populate transport->bundles. We might get nothing or fail
> + * trying, but clone can continue without bundles as well.
> + */
> transport_get_remote_bundle_uri(transport);
>
> if (transport->bundles &&
> @@ -1419,10 +1419,11 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
> else if (fetch_bundle_list(the_repository,
> transport->bundles))
> warning(_("failed to fetch advertised bundles"));
> - } else {
> - clear_bundle_list(transport->bundles);
> - FREE_AND_NULL(transport->bundles);
> }
> +
> + clear_bundle_list(transport->bundles);
> + if (transport->bundles)
> + FREE_AND_NULL(transport->bundles);
> }
>
> if (refs)
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 693f02b958..1e944f81af 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1694,6 +1694,23 @@ static int do_fetch(struct transport *transport,
> retcode = 1;
> }
>
> + /*
> + * Populate transport->bundles. We might get nothing or fail
> + * trying, but fetch can continue without bundles as well.
> + */
> + transport_get_remote_bundle_uri(transport);
> +
> + if (transport->bundles &&
> + hashmap_get_size(&transport->bundles->bundles) &&
> + (transport->bundles->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)) {
> + if (fetch_bundle_list(the_repository, transport->bundles))
> + warning(_("failed to fetch advertised bundles"));
> + }
> +
> + clear_bundle_list(transport->bundles);
> + if (transport->bundles)
> + FREE_AND_NULL(transport->bundles);
> +
> if (fetch_and_consume_refs(&display_state, transport, transaction, ref_map,
> &fetch_head, config)) {
> retcode = 1;
> diff --git a/t/t5584-fetch-bundle-uri.sh b/t/t5584-fetch-bundle-uri.sh
> new file mode 100755
> index 0000000000..6c2383646e
> --- /dev/null
> +++ b/t/t5584-fetch-bundle-uri.sh
> @@ -0,0 +1,49 @@
> +#!/bin/sh
> +
> +test_description='test use of bundle URI in "git fetch"'
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'set up repos and bundles' '
> + git init source &&
> + test_commit -C source A &&
> + git clone --no-local source go-A-to-C &&
> + test_commit -C source B &&
> + git clone --no-local source go-B-to-C &&
> + git clone --no-local source go-B-to-D &&
> + git -C source bundle create B.bundle main &&
> + test_commit -C source C &&
> + git -C source bundle create B-to-C.bundle B..main &&
> + git -C source config uploadpack.advertiseBundleURIs true &&
> + git -C source config bundle.version 1 &&
> + git -C source config bundle.mode all &&
> + git -C source config bundle.heuristic creationToken &&
> + git -C source config bundle.bundle-B.uri "file://$(pwd)/source/B.bundle" &&
> + git -C source config bundle.bundle-B.creationToken 1 &&
> + git -C source config bundle.bundle-B-to-C.uri "file://$(pwd)/source/B-to-C.bundle" &&
> + git -C source config bundle.bundle-B-to-C.creationToken 2
> +'
> +
> +test_expect_success 'fetches one bundle URI to get up-to-date' '
> + git -C go-B-to-C -c transfer.bundleURI=true fetch origin &&
> + test 1 = $(ls go-B-to-C/.git/objects/bundles | wc -l) &&
> + test 2 = $(git -C go-B-to-C config fetch.bundleCreationToken)
> +'
> +
> +test_expect_success 'fetches two bundle URIs to get up-to-date' '
> + git -C go-A-to-C -c transfer.bundleURI=true fetch origin &&
> + test 2 = $(ls go-A-to-C/.git/objects/bundles | wc -l) &&
> + test 2 = $(git -C go-A-to-C config fetch.bundleCreationToken)
> +'
> +
> +test_expect_success 'fetches one bundle URI and objects from remote' '
> + test_commit -C source D &&
> + git -C go-B-to-D -c transfer.bundleURI=true fetch origin &&
> + test 1 = $(ls go-B-to-D/.git/objects/bundles | wc -l) &&
> + test 2 = $(git -C go-B-to-D config fetch.bundleCreationToken)
> +'
> +
> +test_done
> --
> 2.45.2
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 0/3] fetch: use bundle URIs when having creationToken heuristic
2024-07-22 8:07 [PATCH] fetch: use bundle URIs when having creationToken heuristic Toon Claes
2024-07-22 18:40 ` Junio C Hamano
@ 2024-07-24 14:49 ` Toon Claes
2024-07-24 14:49 ` [PATCH v2 1/3] clone: remove double bundle list clear code Toon Claes
` (2 more replies)
2024-09-27 9:04 ` [PATCH] builtin/clone: teach git-clone(1) the --ref= argument Toon Claes
2 siblings, 3 replies; 17+ messages in thread
From: Toon Claes @ 2024-07-24 14:49 UTC (permalink / raw)
To: git; +Cc: gitster, Toon Claes
This is version 2 of teaching git-fetch(1) to use bundle URIs. For this second
revision I've split up the changes in multliple commits as Junio suggested.
- The first patch cleans up some unneccessary code.
- The second patch refactor the code to make it possible to reuse.
- The third patch makes git-fetch(1) use bundle URIs when creationToken
heuristic is used. This time I've added some more information about how
bundle URIs are handled when they have a creationToken heuristic.
Toon Claes (3):
clone: remove double bundle list clear code
transport: introduce transport_has_remote_bundle_uri()
fetch: use bundle URIs when having creationToken heuristic
builtin/clone.c | 26 ++++++--------------
builtin/fetch.c | 13 ++++++++++
t/helper/test-bundle-uri.c | 2 +-
t/t5558-clone-bundle-uri.sh | 28 ++++++++++++++++++++-
t/t5584-fetch-bundle-uri.sh | 49 +++++++++++++++++++++++++++++++++++++
transport.c | 14 ++++++++++-
transport.h | 7 +++---
7 files changed, 114 insertions(+), 25 deletions(-)
create mode 100755 t/t5584-fetch-bundle-uri.sh
Range-diff against v1:
-: ---------- > 1: fb556a76d3 clone: remove double bundle list clear code
1: 00f6017ab0 ! 2: 0f7c53bbe1 fetch: use bundle URIs when having creationToken heuristic
@@ Metadata
Author: Toon Claes <toon@iotcl.com>
## Commit message ##
- fetch: use bundle URIs when having creationToken heuristic
+ transport: introduce transport_has_remote_bundle_uri()
- At the moment, bundle URIs are only used on git-clone(1). For a clone
- the use of bundle URI is trivial, because repository is empty so
- downloading bundles will never result in downloading objects that are in
- the repository already.
+ The public function transport_get_remote_bundle_uri() exists to fetch
+ the bundle URI(s) from the remote. This function is only called from
+ builtin/clone.c (not taking test-tool into account). There it ignores
+ the return value, because it doesn't matter whether the server didn't
+ return any bundles or if it failed trying to fetch them, clone can
+ continue without bundle URIs. After calling it, it checks if anything
+ is collected in the bundle list and starts fetching them.
- For git-fetch(1), this more complicated to use bundle URI. We want to
- avoid downloading bundles that only contains objects that are in the
- local repository already.
-
- One way to achieve this is possible when the "creationToken" heuristic
- is used for bundle URIs. With this heuristic, the server sends along a
- creationToken for each bundle. When a client downloads bundles with such
- creationToken, the highest known value is written to
- `fetch.bundleCreationToken` in the git-config. The next time bundles are
- advertised by the server, bundles with a lower creationToken value are
- ignored. This was already implemented by 7903efb717 (bundle-uri:
- download in creationToken order, 2023-01-31) in
- fetch_bundles_by_token().
-
- Using the creationToken heuristic is optional, but without it the
- client has no idea which bundles are new and which only have objects the
- client already has.
-
- With this knowledge, make git-fetch(1) get bundle URI information from
- the server, but only use bundle URIs if the creationToken heuristic is
- used.
-
- The code added in builtin/fetch.c is very similar to the code in
- builtin/clone.c, so while at it make sure we always clean up the bundles
- list advertised by the server.
+ Add public function transport_has_remote_bundle_uri() instead. This
+ calls the (now made private) transport_get_remote_bundle_uri() function
+ and returns whether any bundle URI is received. This makes reuse of the
+ code easier and avoids code duplication when we add bundle URI support
+ to git-fetch(1).
Signed-off-by: Toon Claes <toon@iotcl.com>
## builtin/clone.c ##
@@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
+ bundle_uri);
+ else if (has_heuristic)
git_config_set_gently("fetch.bundleuri", bundle_uri);
- } else {
- /*
+- } else {
+- /*
- * Populate transport->got_remote_bundle_uri and
- * transport->bundle_uri. We might get nothing.
- */
-+ * Populate transport->bundles. We might get nothing or fail
-+ * trying, but clone can continue without bundles as well.
-+ */
- transport_get_remote_bundle_uri(transport);
-
- if (transport->bundles &&
-@@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
- else if (fetch_bundle_list(the_repository,
- transport->bundles))
- warning(_("failed to fetch advertised bundles"));
-- } else {
-- clear_bundle_list(transport->bundles);
-- FREE_AND_NULL(transport->bundles);
- }
-+
-+ clear_bundle_list(transport->bundles);
-+ if (transport->bundles)
-+ FREE_AND_NULL(transport->bundles);
+- transport_get_remote_bundle_uri(transport);
+-
+- if (transport->bundles &&
+- hashmap_get_size(&transport->bundles->bundles)) {
+- /* At this point, we need the_repository to match the cloned repo. */
+- if (repo_init(the_repository, git_dir, work_tree))
+- warning(_("failed to initialize the repo, skipping bundle URI"));
+- else if (fetch_bundle_list(the_repository,
+- transport->bundles))
+- warning(_("failed to fetch advertised bundles"));
+- }
++ } else if (transport_has_remote_bundle_uri(transport)) {
++ /* At this point, we need the_repository to match the cloned repo. */
++ if (repo_init(the_repository, git_dir, work_tree))
++ warning(_("failed to initialize the repo, skipping bundle URI"));
++ else if (fetch_bundle_list(the_repository,
++ transport->bundles))
++ warning(_("failed to fetch advertised bundles"));
}
if (refs)
- ## builtin/fetch.c ##
-@@ builtin/fetch.c: static int do_fetch(struct transport *transport,
- retcode = 1;
+ ## t/helper/test-bundle-uri.c ##
+@@ t/helper/test-bundle-uri.c: static int cmd_ls_remote(int argc, const char **argv)
}
-+ /*
-+ * Populate transport->bundles. We might get nothing or fail
-+ * trying, but fetch can continue without bundles as well.
-+ */
-+ transport_get_remote_bundle_uri(transport);
-+
-+ if (transport->bundles &&
-+ hashmap_get_size(&transport->bundles->bundles) &&
-+ (transport->bundles->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)) {
-+ if (fetch_bundle_list(the_repository, transport->bundles))
-+ warning(_("failed to fetch advertised bundles"));
-+ }
-+
-+ clear_bundle_list(transport->bundles);
-+ if (transport->bundles)
-+ FREE_AND_NULL(transport->bundles);
-+
- if (fetch_and_consume_refs(&display_state, transport, transaction, ref_map,
- &fetch_head, config)) {
- retcode = 1;
+ transport = transport_get(remote, NULL);
+- if (transport_get_remote_bundle_uri(transport) < 0) {
++ if (!transport_has_remote_bundle_uri(transport)) {
+ error(_("could not get the bundle-uri list"));
+ status = 1;
+ goto cleanup;
- ## t/t5584-fetch-bundle-uri.sh (new) ##
-@@
-+#!/bin/sh
-+
-+test_description='test use of bundle URI in "git fetch"'
-+
-+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
-+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
-+
-+. ./test-lib.sh
-+
-+test_expect_success 'set up repos and bundles' '
-+ git init source &&
-+ test_commit -C source A &&
-+ git clone --no-local source go-A-to-C &&
-+ test_commit -C source B &&
-+ git clone --no-local source go-B-to-C &&
-+ git clone --no-local source go-B-to-D &&
-+ git -C source bundle create B.bundle main &&
-+ test_commit -C source C &&
-+ git -C source bundle create B-to-C.bundle B..main &&
-+ git -C source config uploadpack.advertiseBundleURIs true &&
-+ git -C source config bundle.version 1 &&
-+ git -C source config bundle.mode all &&
-+ git -C source config bundle.heuristic creationToken &&
-+ git -C source config bundle.bundle-B.uri "file://$(pwd)/source/B.bundle" &&
-+ git -C source config bundle.bundle-B.creationToken 1 &&
-+ git -C source config bundle.bundle-B-to-C.uri "file://$(pwd)/source/B-to-C.bundle" &&
-+ git -C source config bundle.bundle-B-to-C.creationToken 2
-+'
+ ## transport.c ##
+@@ transport.c: int transport_fetch_refs(struct transport *transport, struct ref *refs)
+ return rc;
+ }
+
+-int transport_get_remote_bundle_uri(struct transport *transport)
++static int transport_get_remote_bundle_uri(struct transport *transport)
+ {
+ int value = 0;
+ const struct transport_vtable *vtable = transport->vtable;
+@@ transport.c: int transport_get_remote_bundle_uri(struct transport *transport)
+
+ if (vtable->get_bundle_uri(transport) < 0)
+ return error(_("could not retrieve server-advertised bundle-uri list"));
+
-+test_expect_success 'fetches one bundle URI to get up-to-date' '
-+ git -C go-B-to-C -c transfer.bundleURI=true fetch origin &&
-+ test 1 = $(ls go-B-to-C/.git/objects/bundles | wc -l) &&
-+ test 2 = $(git -C go-B-to-C config fetch.bundleCreationToken)
-+'
++ return 0;
++}
+
-+test_expect_success 'fetches two bundle URIs to get up-to-date' '
-+ git -C go-A-to-C -c transfer.bundleURI=true fetch origin &&
-+ test 2 = $(ls go-A-to-C/.git/objects/bundles | wc -l) &&
-+ test 2 = $(git -C go-A-to-C config fetch.bundleCreationToken)
-+'
++int transport_has_remote_bundle_uri(struct transport *transport)
++{
++ transport_get_remote_bundle_uri(transport);
+
-+test_expect_success 'fetches one bundle URI and objects from remote' '
-+ test_commit -C source D &&
-+ git -C go-B-to-D -c transfer.bundleURI=true fetch origin &&
-+ test 1 = $(ls go-B-to-D/.git/objects/bundles | wc -l) &&
-+ test 2 = $(git -C go-B-to-D config fetch.bundleCreationToken)
-+'
++ if (transport->bundles &&
++ hashmap_get_size(&transport->bundles->bundles))
++ return 1;
+
-+test_done
+ return 0;
+ }
+
+
+ ## transport.h ##
+@@ transport.h: const struct ref *transport_get_remote_refs(struct transport *transport,
+ struct transport_ls_refs_options *transport_options);
+
+ /**
+- * Retrieve bundle URI(s) from a remote. Populates "struct
+- * transport"'s "bundle_uri" and "got_remote_bundle_uri".
++ * Try fetch bundle URI(s) from a remote and returns 1 if one or more
++ * bundle URI(s) are received from the server.
++ * Populates "struct transport"'s "bundles" and "got_remote_bundle_uri".
+ */
+-int transport_get_remote_bundle_uri(struct transport *transport);
++int transport_has_remote_bundle_uri(struct transport *transport);
+
+ /*
+ * Fetch the hash algorithm used by a remote.
-: ---------- > 3: db8f303dde fetch: use bundle URIs when having creationToken heuristic
--
2.45.2
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 1/3] clone: remove double bundle list clear code
2024-07-24 14:49 ` [PATCH v2 0/3] " Toon Claes
@ 2024-07-24 14:49 ` Toon Claes
2024-07-26 8:51 ` Karthik Nayak
2024-07-26 21:52 ` Justin Tobler
2024-07-24 14:49 ` [PATCH v2 2/3] transport: introduce transport_has_remote_bundle_uri() Toon Claes
2024-07-24 14:49 ` [PATCH v2 3/3] fetch: use bundle URIs when having creationToken heuristic Toon Claes
2 siblings, 2 replies; 17+ messages in thread
From: Toon Claes @ 2024-07-24 14:49 UTC (permalink / raw)
To: git; +Cc: gitster, Toon Claes
The bundle list transport->bundles is filled by
transport_get_remote_bundle_uri(). Only when the list is not used, it is
cleared right away by calling clear_bundle_list().
This looks like we leak memory allocated for the list when
transport->bundles *is* used. But in fact, transport->bundles is cleaned
up in transport_disconnect() near the end of cmd_clone().
Remove the double clean up of transport->bundles, and depend solely on
transport_disconnect() to take care of it.
Also add a test case that hits this code, but due to other leaks we
cannot mark it as leak-free.
Signed-off-by: Toon Claes <toon@iotcl.com>
---
builtin/clone.c | 3 ---
t/t5558-clone-bundle-uri.sh | 28 +++++++++++++++++++++++++++-
2 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/builtin/clone.c b/builtin/clone.c
index af6017d41a..aa507395a0 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1419,9 +1419,6 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
else if (fetch_bundle_list(the_repository,
transport->bundles))
warning(_("failed to fetch advertised bundles"));
- } else {
- clear_bundle_list(transport->bundles);
- FREE_AND_NULL(transport->bundles);
}
}
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index cd05321e17..2d6e690fbe 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -1,6 +1,6 @@
#!/bin/sh
-test_description='test fetching bundles with --bundle-uri'
+test_description='test clone with use of bundle-uri'
. ./test-lib.sh
. "$TEST_DIRECTORY"/lib-bundle.sh
@@ -438,6 +438,32 @@ test_expect_success 'negotiation: bundle list with all wanted commits' '
test_grep ! "clone> want " trace-packet.txt
'
+test_expect_success 'bundles advertised by the server' '
+ test_when_finished rm -f trace*.txt &&
+ git clone clone-from clone-advertiser &&
+ git -C clone-advertiser config uploadpack.advertiseBundleURIs true &&
+ git -C clone-advertiser config bundle.version 1 &&
+ git -C clone-advertiser config bundle.mode all &&
+ git -C clone-advertiser config bundle.bundle-1.uri "file://$(pwd)/clone-from/bundle-1.bundle" &&
+ git -C clone-advertiser config bundle.bundle-2.uri "file://$(pwd)/clone-from/bundle-2.bundle" &&
+ git -C clone-advertiser config bundle.bundle-3.uri "file://$(pwd)/clone-from/bundle-3.bundle" &&
+ git -C clone-advertiser config bundle.bundle-4.uri "file://$(pwd)/clone-from/bundle-4.bundle" &&
+
+ GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+ git -c transfer.bundleURI=true clone clone-advertiser clone-advertised &&
+ git -C clone-advertised for-each-ref --format="%(refname)" >refs &&
+ grep "refs/bundles/" refs >actual &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ refs/bundles/left
+ refs/bundles/merge
+ refs/bundles/right
+ EOF
+ test_cmp expect actual &&
+ # We already have all needed commits so no "want" needed.
+ test_grep ! "clone> want " trace-packet.txt
+'
+
#########################################################################
# HTTP tests begin here
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 2/3] transport: introduce transport_has_remote_bundle_uri()
2024-07-24 14:49 ` [PATCH v2 0/3] " Toon Claes
2024-07-24 14:49 ` [PATCH v2 1/3] clone: remove double bundle list clear code Toon Claes
@ 2024-07-24 14:49 ` Toon Claes
2024-07-26 8:58 ` Karthik Nayak
2024-07-24 14:49 ` [PATCH v2 3/3] fetch: use bundle URIs when having creationToken heuristic Toon Claes
2 siblings, 1 reply; 17+ messages in thread
From: Toon Claes @ 2024-07-24 14:49 UTC (permalink / raw)
To: git; +Cc: gitster, Toon Claes
The public function transport_get_remote_bundle_uri() exists to fetch
the bundle URI(s) from the remote. This function is only called from
builtin/clone.c (not taking test-tool into account). There it ignores
the return value, because it doesn't matter whether the server didn't
return any bundles or if it failed trying to fetch them, clone can
continue without bundle URIs. After calling it, it checks if anything
is collected in the bundle list and starts fetching them.
Add public function transport_has_remote_bundle_uri() instead. This
calls the (now made private) transport_get_remote_bundle_uri() function
and returns whether any bundle URI is received. This makes reuse of the
code easier and avoids code duplication when we add bundle URI support
to git-fetch(1).
Signed-off-by: Toon Claes <toon@iotcl.com>
---
builtin/clone.c | 23 +++++++----------------
t/helper/test-bundle-uri.c | 2 +-
transport.c | 14 +++++++++++++-
transport.h | 7 ++++---
4 files changed, 25 insertions(+), 21 deletions(-)
diff --git a/builtin/clone.c b/builtin/clone.c
index aa507395a0..25535c1814 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1404,22 +1404,13 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
bundle_uri);
else if (has_heuristic)
git_config_set_gently("fetch.bundleuri", bundle_uri);
- } else {
- /*
- * Populate transport->got_remote_bundle_uri and
- * transport->bundle_uri. We might get nothing.
- */
- transport_get_remote_bundle_uri(transport);
-
- if (transport->bundles &&
- hashmap_get_size(&transport->bundles->bundles)) {
- /* At this point, we need the_repository to match the cloned repo. */
- if (repo_init(the_repository, git_dir, work_tree))
- warning(_("failed to initialize the repo, skipping bundle URI"));
- else if (fetch_bundle_list(the_repository,
- transport->bundles))
- warning(_("failed to fetch advertised bundles"));
- }
+ } else if (transport_has_remote_bundle_uri(transport)) {
+ /* At this point, we need the_repository to match the cloned repo. */
+ if (repo_init(the_repository, git_dir, work_tree))
+ warning(_("failed to initialize the repo, skipping bundle URI"));
+ else if (fetch_bundle_list(the_repository,
+ transport->bundles))
+ warning(_("failed to fetch advertised bundles"));
}
if (refs)
diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
index 0c5fa723d8..bd558d5e57 100644
--- a/t/helper/test-bundle-uri.c
+++ b/t/helper/test-bundle-uri.c
@@ -90,7 +90,7 @@ static int cmd_ls_remote(int argc, const char **argv)
}
transport = transport_get(remote, NULL);
- if (transport_get_remote_bundle_uri(transport) < 0) {
+ if (!transport_has_remote_bundle_uri(transport)) {
error(_("could not get the bundle-uri list"));
status = 1;
goto cleanup;
diff --git a/transport.c b/transport.c
index 12cc5b4d96..1a7d86fa40 100644
--- a/transport.c
+++ b/transport.c
@@ -1536,7 +1536,7 @@ int transport_fetch_refs(struct transport *transport, struct ref *refs)
return rc;
}
-int transport_get_remote_bundle_uri(struct transport *transport)
+static int transport_get_remote_bundle_uri(struct transport *transport)
{
int value = 0;
const struct transport_vtable *vtable = transport->vtable;
@@ -1561,6 +1561,18 @@ int transport_get_remote_bundle_uri(struct transport *transport)
if (vtable->get_bundle_uri(transport) < 0)
return error(_("could not retrieve server-advertised bundle-uri list"));
+
+ return 0;
+}
+
+int transport_has_remote_bundle_uri(struct transport *transport)
+{
+ transport_get_remote_bundle_uri(transport);
+
+ if (transport->bundles &&
+ hashmap_get_size(&transport->bundles->bundles))
+ return 1;
+
return 0;
}
diff --git a/transport.h b/transport.h
index 6393cd9823..5ea9641558 100644
--- a/transport.h
+++ b/transport.h
@@ -294,10 +294,11 @@ const struct ref *transport_get_remote_refs(struct transport *transport,
struct transport_ls_refs_options *transport_options);
/**
- * Retrieve bundle URI(s) from a remote. Populates "struct
- * transport"'s "bundle_uri" and "got_remote_bundle_uri".
+ * Try fetch bundle URI(s) from a remote and returns 1 if one or more
+ * bundle URI(s) are received from the server.
+ * Populates "struct transport"'s "bundles" and "got_remote_bundle_uri".
*/
-int transport_get_remote_bundle_uri(struct transport *transport);
+int transport_has_remote_bundle_uri(struct transport *transport);
/*
* Fetch the hash algorithm used by a remote.
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 3/3] fetch: use bundle URIs when having creationToken heuristic
2024-07-24 14:49 ` [PATCH v2 0/3] " Toon Claes
2024-07-24 14:49 ` [PATCH v2 1/3] clone: remove double bundle list clear code Toon Claes
2024-07-24 14:49 ` [PATCH v2 2/3] transport: introduce transport_has_remote_bundle_uri() Toon Claes
@ 2024-07-24 14:49 ` Toon Claes
2024-07-26 9:06 ` Karthik Nayak
2024-07-26 12:50 ` Patrick Steinhardt
2 siblings, 2 replies; 17+ messages in thread
From: Toon Claes @ 2024-07-24 14:49 UTC (permalink / raw)
To: git; +Cc: gitster, Toon Claes
At the moment, bundle URIs are only used by git-clone(1). For a clone
the use of bundle URI is trivial, because the repository is empty so
downloading bundles will never result in downloading objects that are in
the repository already.
For git-fetch(1), this more complicated to use bundle URI. We want to
avoid downloading bundles that only contains objects that are in the
local repository already.
One way to achieve this is possible when the "creationToken" heuristic
is used for bundle URIs. We attempt to download and unbundle the minimum
number of bundles by creationToken in decreasing order. If we fail to
unbundle (after a successful download) then move to the next
non-downloaded bundle and attempt downloading. Once we succeed in
applying a bundle, move to the previous unapplied bundle and attempt to
unbundle it again. At the end the highest applied creationToken is
written to `fetch.bundleCreationToken` in the git-config. The next time
bundles are advertised by the server, bundles with a lower creationToken
value are ignored. This was already implemented by
7903efb717 (bundle-uri: download in creationToken order, 2023-01-31) in
fetch_bundles_by_token().
Using the creationToken heuristic is optional, but without it the client
has no idea which bundles are new, how to sort them, and which only have
objects the client already has.
With this knowledge, make git-fetch(1) use bundle URIs from the server,
but only when the creationToken heuristic is used.
Signed-off-by: Toon Claes <toon@iotcl.com>
---
builtin/fetch.c | 13 ++++++++++
t/t5584-fetch-bundle-uri.sh | 49 +++++++++++++++++++++++++++++++++++++
2 files changed, 62 insertions(+)
create mode 100755 t/t5584-fetch-bundle-uri.sh
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 693f02b958..98e811f438 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1694,6 +1694,19 @@ static int do_fetch(struct transport *transport,
retcode = 1;
}
+ if (transport_has_remote_bundle_uri(transport)) {
+ /*
+ * Only use bundle-URIs when they use the creationToken
+ * heuristic, this allows us to ensure not downloading bundles
+ * we don't need. You can read the comments in
+ * fetch_bundles_by_token() to understand how this works.
+ */
+ if (transport->bundles->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
+ if (fetch_bundle_list(the_repository, transport->bundles))
+ warning(_("failed to fetch advertised bundles"));
+ }
+ }
+
if (fetch_and_consume_refs(&display_state, transport, transaction, ref_map,
&fetch_head, config)) {
retcode = 1;
diff --git a/t/t5584-fetch-bundle-uri.sh b/t/t5584-fetch-bundle-uri.sh
new file mode 100755
index 0000000000..6c2383646e
--- /dev/null
+++ b/t/t5584-fetch-bundle-uri.sh
@@ -0,0 +1,49 @@
+#!/bin/sh
+
+test_description='test use of bundle URI in "git fetch"'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+test_expect_success 'set up repos and bundles' '
+ git init source &&
+ test_commit -C source A &&
+ git clone --no-local source go-A-to-C &&
+ test_commit -C source B &&
+ git clone --no-local source go-B-to-C &&
+ git clone --no-local source go-B-to-D &&
+ git -C source bundle create B.bundle main &&
+ test_commit -C source C &&
+ git -C source bundle create B-to-C.bundle B..main &&
+ git -C source config uploadpack.advertiseBundleURIs true &&
+ git -C source config bundle.version 1 &&
+ git -C source config bundle.mode all &&
+ git -C source config bundle.heuristic creationToken &&
+ git -C source config bundle.bundle-B.uri "file://$(pwd)/source/B.bundle" &&
+ git -C source config bundle.bundle-B.creationToken 1 &&
+ git -C source config bundle.bundle-B-to-C.uri "file://$(pwd)/source/B-to-C.bundle" &&
+ git -C source config bundle.bundle-B-to-C.creationToken 2
+'
+
+test_expect_success 'fetches one bundle URI to get up-to-date' '
+ git -C go-B-to-C -c transfer.bundleURI=true fetch origin &&
+ test 1 = $(ls go-B-to-C/.git/objects/bundles | wc -l) &&
+ test 2 = $(git -C go-B-to-C config fetch.bundleCreationToken)
+'
+
+test_expect_success 'fetches two bundle URIs to get up-to-date' '
+ git -C go-A-to-C -c transfer.bundleURI=true fetch origin &&
+ test 2 = $(ls go-A-to-C/.git/objects/bundles | wc -l) &&
+ test 2 = $(git -C go-A-to-C config fetch.bundleCreationToken)
+'
+
+test_expect_success 'fetches one bundle URI and objects from remote' '
+ test_commit -C source D &&
+ git -C go-B-to-D -c transfer.bundleURI=true fetch origin &&
+ test 1 = $(ls go-B-to-D/.git/objects/bundles | wc -l) &&
+ test 2 = $(git -C go-B-to-D config fetch.bundleCreationToken)
+'
+
+test_done
--
2.45.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/3] clone: remove double bundle list clear code
2024-07-24 14:49 ` [PATCH v2 1/3] clone: remove double bundle list clear code Toon Claes
@ 2024-07-26 8:51 ` Karthik Nayak
2024-07-26 21:52 ` Justin Tobler
1 sibling, 0 replies; 17+ messages in thread
From: Karthik Nayak @ 2024-07-26 8:51 UTC (permalink / raw)
To: Toon Claes, git; +Cc: gitster
[-- Attachment #1: Type: text/plain, Size: 1531 bytes --]
Toon Claes <toon@iotcl.com> writes:
> The bundle list transport->bundles is filled by
> transport_get_remote_bundle_uri(). Only when the list is not used, it is
> cleared right away by calling clear_bundle_list().
>
> This looks like we leak memory allocated for the list when
> transport->bundles *is* used. But in fact, transport->bundles is cleaned
> up in transport_disconnect() near the end of cmd_clone().
>
> Remove the double clean up of transport->bundles, and depend solely on
> transport_disconnect() to take care of it.
>
> Also add a test case that hits this code, but due to other leaks we
> cannot mark it as leak-free.
>
> Signed-off-by: Toon Claes <toon@iotcl.com>
> ---
> builtin/clone.c | 3 ---
> t/t5558-clone-bundle-uri.sh | 28 +++++++++++++++++++++++++++-
> 2 files changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index af6017d41a..aa507395a0 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1419,9 +1419,6 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
> else if (fetch_bundle_list(the_repository,
> transport->bundles))
> warning(_("failed to fetch advertised bundles"));
> - } else {
> - clear_bundle_list(transport->bundles);
> - FREE_AND_NULL(transport->bundles);
>
There is a small difference that here we also set `transport->bundles`
to NULL, whereas in `transport_disconnect()` we only do `free(...)`. I
don't see any issues because of it though. This cleanup looks good.
[snip]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/3] transport: introduce transport_has_remote_bundle_uri()
2024-07-24 14:49 ` [PATCH v2 2/3] transport: introduce transport_has_remote_bundle_uri() Toon Claes
@ 2024-07-26 8:58 ` Karthik Nayak
2024-07-26 15:25 ` Junio C Hamano
0 siblings, 1 reply; 17+ messages in thread
From: Karthik Nayak @ 2024-07-26 8:58 UTC (permalink / raw)
To: Toon Claes, git; +Cc: gitster
[-- Attachment #1: Type: text/plain, Size: 2586 bytes --]
Toon Claes <toon@iotcl.com> writes:
[snip]
> diff --git a/t/helper/test-bundle-uri.c b/t/helper/test-bundle-uri.c
> index 0c5fa723d8..bd558d5e57 100644
> --- a/t/helper/test-bundle-uri.c
> +++ b/t/helper/test-bundle-uri.c
> @@ -90,7 +90,7 @@ static int cmd_ls_remote(int argc, const char **argv)
> }
>
> transport = transport_get(remote, NULL);
> - if (transport_get_remote_bundle_uri(transport) < 0) {
> + if (!transport_has_remote_bundle_uri(transport)) {
> error(_("could not get the bundle-uri list"));
> status = 1;
> goto cleanup;
> diff --git a/transport.c b/transport.c
> index 12cc5b4d96..1a7d86fa40 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -1536,7 +1536,7 @@ int transport_fetch_refs(struct transport *transport, struct ref *refs)
> return rc;
> }
>
> -int transport_get_remote_bundle_uri(struct transport *transport)
> +static int transport_get_remote_bundle_uri(struct transport *transport)
>
Why make it static?
> {
> int value = 0;
> const struct transport_vtable *vtable = transport->vtable;
> @@ -1561,6 +1561,18 @@ int transport_get_remote_bundle_uri(struct transport *transport)
>
> if (vtable->get_bundle_uri(transport) < 0)
> return error(_("could not retrieve server-advertised bundle-uri list"));
> +
> + return 0;
> +}
> +
> +int transport_has_remote_bundle_uri(struct transport *transport)
> +{
> + transport_get_remote_bundle_uri(transport);
> +
> + if (transport->bundles &&
> + hashmap_get_size(&transport->bundles->bundles))
> + return 1;
> +
> return 0;
> }
>
> diff --git a/transport.h b/transport.h
> index 6393cd9823..5ea9641558 100644
> --- a/transport.h
> +++ b/transport.h
> @@ -294,10 +294,11 @@ const struct ref *transport_get_remote_refs(struct transport *transport,
> struct transport_ls_refs_options *transport_options);
>
> /**
> - * Retrieve bundle URI(s) from a remote. Populates "struct
> - * transport"'s "bundle_uri" and "got_remote_bundle_uri".
> + * Try fetch bundle URI(s) from a remote and returns 1 if one or more
Nit: s/Try/Try to/
> + * bundle URI(s) are received from the server.
> + * Populates "struct transport"'s "bundles" and "got_remote_bundle_uri".
> */
> -int transport_get_remote_bundle_uri(struct transport *transport);
> +int transport_has_remote_bundle_uri(struct transport *transport);
>
Shouldn't this now be renamed to `transport_has_bundle_uri`? Earlier, we
were 'getting' bundle URIs from the remote. Now, we are abstracting that
away and simply asking, 'do we have bundle URIs'.
> /*
> * Fetch the hash algorithm used by a remote.
> --
> 2.45.2
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 3/3] fetch: use bundle URIs when having creationToken heuristic
2024-07-24 14:49 ` [PATCH v2 3/3] fetch: use bundle URIs when having creationToken heuristic Toon Claes
@ 2024-07-26 9:06 ` Karthik Nayak
2024-07-26 12:50 ` Patrick Steinhardt
1 sibling, 0 replies; 17+ messages in thread
From: Karthik Nayak @ 2024-07-26 9:06 UTC (permalink / raw)
To: Toon Claes, git; +Cc: gitster
[-- Attachment #1: Type: text/plain, Size: 1931 bytes --]
Toon Claes <toon@iotcl.com> writes:
> At the moment, bundle URIs are only used by git-clone(1). For a clone
> the use of bundle URI is trivial, because the repository is empty so
> downloading bundles will never result in downloading objects that are in
> the repository already.
>
> For git-fetch(1), this more complicated to use bundle URI. We want to
Nit: s/this/it is/
> avoid downloading bundles that only contains objects that are in the
> local repository already.
>
> One way to achieve this is possible when the "creationToken" heuristic
> is used for bundle URIs. We attempt to download and unbundle the minimum
This first sentence reads a bit weird. Perhaps, "One way to achieve
this is to restrict the usage of bundle URIs to the 'creationToken'
heuristic for git-fetch(1)."?
> number of bundles by creationToken in decreasing order. If we fail to
> unbundle (after a successful download) then move to the next
> non-downloaded bundle and attempt downloading. Once we succeed in
> applying a bundle, move to the previous unapplied bundle and attempt to
> unbundle it again. At the end the highest applied creationToken is
> written to `fetch.bundleCreationToken` in the git-config. The next time
> bundles are advertised by the server, bundles with a lower creationToken
> value are ignored. This was already implemented by
> 7903efb717 (bundle-uri: download in creationToken order, 2023-01-31) in
> fetch_bundles_by_token().
>
> Using the creationToken heuristic is optional, but without it the client
> has no idea which bundles are new, how to sort them, and which only have
> objects the client already has.
>
> With this knowledge, make git-fetch(1) use bundle URIs from the server,
> but only when the creationToken heuristic is used.
>
I would say that apart from all this it is also worth noting that bundle
URIs are still opt-in from the client side.
Rest of the patch looks good to me.
Thanks
[snip]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 3/3] fetch: use bundle URIs when having creationToken heuristic
2024-07-24 14:49 ` [PATCH v2 3/3] fetch: use bundle URIs when having creationToken heuristic Toon Claes
2024-07-26 9:06 ` Karthik Nayak
@ 2024-07-26 12:50 ` Patrick Steinhardt
2024-08-02 13:46 ` Toon claes
1 sibling, 1 reply; 17+ messages in thread
From: Patrick Steinhardt @ 2024-07-26 12:50 UTC (permalink / raw)
To: Toon Claes; +Cc: git, gitster
[-- Attachment #1: Type: text/plain, Size: 2769 bytes --]
On Wed, Jul 24, 2024 at 04:49:57PM +0200, Toon Claes wrote:
> One way to achieve this is possible when the "creationToken" heuristic
> is used for bundle URIs. We attempt to download and unbundle the minimum
> number of bundles by creationToken in decreasing order. If we fail to
> unbundle (after a successful download) then move to the next
> non-downloaded bundle and attempt downloading. Once we succeed in
> applying a bundle, move to the previous unapplied bundle and attempt to
> unbundle it again. At the end the highest applied creationToken is
> written to `fetch.bundleCreationToken` in the git-config. The next time
> bundles are advertised by the server, bundles with a lower creationToken
> value are ignored. This was already implemented by
> 7903efb717 (bundle-uri: download in creationToken order, 2023-01-31) in
> fetch_bundles_by_token().
I think Junio essentially asked this already, but I'm still missing the
bigger picture here. When the "creationToken" heuristic is applied, the
effect of your change is that we'll always favor bundle URIs now over
performing proper fetches, right?
Now suppose that the server creates new bundled whenever somebody pushes
a new change to the default branch. We do not really have information
how this bundle is structured. It _could_ be an incremental bundle, and
in that case it might be sensible to fetch that bundle. But it could
also be that the server generates a full bundle including all objects
transitively reachable from that default branch. Now if we started to
rely on the "creationToken" heuristic, we would basically end up
re-downloading the complete repository, which is a strict regression.
Now that scenario is of course hypothetical. But the problem is that the
strategy for how bundle URIs are generated are determined by the hosting
provider. So ultimately, I expect that the reality will lie somewhere in
between and be different depending on which hosting solution you use.
All of this to me means that the "creationToken" heuristic is not really
a good signal, unless I'm missing something about the way it works. Is
there any additional signal provided by the server except for the time
when the bundle was created? If so, is that information sufficient to
determine whether it makes sense for a client to fetch a bundle instead
of performing a "proper" fetch? If not, what is the additional info that
we would need to make this schema work properly?
So unless I'm missing something, I feel like we need to think bigger and
design a heuristic that gives us the information needed. Without such a
heuristic, default-enabling may or may not do the right thing, and we
have no way to really argue whether it will do as we now depend on
server operators to do the right thing.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/3] transport: introduce transport_has_remote_bundle_uri()
2024-07-26 8:58 ` Karthik Nayak
@ 2024-07-26 15:25 ` Junio C Hamano
0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2024-07-26 15:25 UTC (permalink / raw)
To: Karthik Nayak; +Cc: Toon Claes, git
Karthik Nayak <karthik.188@gmail.com> writes:
>> diff --git a/transport.c b/transport.c
>> index 12cc5b4d96..1a7d86fa40 100644
>> --- a/transport.c
>> +++ b/transport.c
>> @@ -1536,7 +1536,7 @@ int transport_fetch_refs(struct transport *transport, struct ref *refs)
>> return rc;
>> }
>>
>> -int transport_get_remote_bundle_uri(struct transport *transport)
>> +static int transport_get_remote_bundle_uri(struct transport *transport)
>>
>
> Why make it static?
The reason is rather well described in the proposed log message, I
think.
>> + * bundle URI(s) are received from the server.
>> + * Populates "struct transport"'s "bundles" and "got_remote_bundle_uri".
>> */
>> -int transport_get_remote_bundle_uri(struct transport *transport);
>> +int transport_has_remote_bundle_uri(struct transport *transport);
>>
>
> Shouldn't this now be renamed to `transport_has_bundle_uri`? Earlier, we
> were 'getting' bundle URIs from the remote. Now, we are abstracting that
> away and simply asking, 'do we have bundle URIs'.
A good question.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/3] clone: remove double bundle list clear code
2024-07-24 14:49 ` [PATCH v2 1/3] clone: remove double bundle list clear code Toon Claes
2024-07-26 8:51 ` Karthik Nayak
@ 2024-07-26 21:52 ` Justin Tobler
2024-08-02 15:45 ` Toon claes
1 sibling, 1 reply; 17+ messages in thread
From: Justin Tobler @ 2024-07-26 21:52 UTC (permalink / raw)
To: Toon Claes; +Cc: git, gitster
On 24/07/24 04:49PM, Toon Claes wrote:
> The bundle list transport->bundles is filled by
> transport_get_remote_bundle_uri(). Only when the list is not used, it is
> cleared right away by calling clear_bundle_list().
>
> This looks like we leak memory allocated for the list when
> transport->bundles *is* used. But in fact, transport->bundles is cleaned
> up in transport_disconnect() near the end of cmd_clone().
The `transport->bundles` is setup and initialized by `transport_get()`
and populated by `transport_get_remote_bundle_uri()`. In its current
form, we clean up `transport->bundles` if it is empty immediately after
checking the remote for bundles. This is redundant though because the
cleanup already occurs as part of `transport_disconnect()`.
Since `transport_disconnect()` is already responsible for freeing other
parts of `transport`, it makes sense to let it be the one to handle it.
>
> Remove the double clean up of transport->bundles, and depend solely on
> transport_disconnect() to take care of it.
>
> Also add a test case that hits this code, but due to other leaks we
> cannot mark it as leak-free.
>
> Signed-off-by: Toon Claes <toon@iotcl.com>
> ---
> builtin/clone.c | 3 ---
> t/t5558-clone-bundle-uri.sh | 28 +++++++++++++++++++++++++++-
> 2 files changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index af6017d41a..aa507395a0 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1419,9 +1419,6 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
> else if (fetch_bundle_list(the_repository,
> transport->bundles))
> warning(_("failed to fetch advertised bundles"));
> - } else {
> - clear_bundle_list(transport->bundles);
> - FREE_AND_NULL(transport->bundles);
> }
> }
>
> diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
> index cd05321e17..2d6e690fbe 100755
> --- a/t/t5558-clone-bundle-uri.sh
> +++ b/t/t5558-clone-bundle-uri.sh
> @@ -1,6 +1,6 @@
> #!/bin/sh
>
> -test_description='test fetching bundles with --bundle-uri'
> +test_description='test clone with use of bundle-uri'
I assume this description was changed because the test pertains to
clones and not fetches. Might be worth mentioning in the commit message.
>
> . ./test-lib.sh
> . "$TEST_DIRECTORY"/lib-bundle.sh
> @@ -438,6 +438,32 @@ test_expect_success 'negotiation: bundle list with all wanted commits' '
> test_grep ! "clone> want " trace-packet.txt
> '
>
> +test_expect_success 'bundles advertised by the server' '
> + test_when_finished rm -f trace*.txt &&
> + git clone clone-from clone-advertiser &&
> + git -C clone-advertiser config uploadpack.advertiseBundleURIs true &&
> + git -C clone-advertiser config bundle.version 1 &&
> + git -C clone-advertiser config bundle.mode all &&
> + git -C clone-advertiser config bundle.bundle-1.uri "file://$(pwd)/clone-from/bundle-1.bundle" &&
> + git -C clone-advertiser config bundle.bundle-2.uri "file://$(pwd)/clone-from/bundle-2.bundle" &&
> + git -C clone-advertiser config bundle.bundle-3.uri "file://$(pwd)/clone-from/bundle-3.bundle" &&
> + git -C clone-advertiser config bundle.bundle-4.uri "file://$(pwd)/clone-from/bundle-4.bundle" &&
> +
> + GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> + git -c transfer.bundleURI=true clone clone-advertiser clone-advertised &&
> + git -C clone-advertised for-each-ref --format="%(refname)" >refs &&
> + grep "refs/bundles/" refs >actual &&
> + cat >expect <<-\EOF &&
> + refs/bundles/base
> + refs/bundles/left
> + refs/bundles/merge
> + refs/bundles/right
> + EOF
> + test_cmp expect actual &&
> + # We already have all needed commits so no "want" needed.
> + test_grep ! "clone> want " trace-packet.txt
> +'
> +
Looks like this test is validating that a clone is retrieving bundles
from a remote repository advertising said bundles. We to do so by
checking for the `refs/bundles/*` references which should only exist if
the advertised bundles were fetched. This makes sense to me.
From the commit message though, I thought the added test might have
something to do with the changes to when `transport->bundles` was freed.
I wonder if it would be a good idea to expland on this further in the
commit message. Or since the added test is somewhat unrelated to this
patch, maybe it should be a separate patch?
-Justin
> #########################################################################
> # HTTP tests begin here
>
> --
> 2.45.2
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 3/3] fetch: use bundle URIs when having creationToken heuristic
2024-07-26 12:50 ` Patrick Steinhardt
@ 2024-08-02 13:46 ` Toon claes
2024-08-22 7:12 ` Patrick Steinhardt
0 siblings, 1 reply; 17+ messages in thread
From: Toon claes @ 2024-08-02 13:46 UTC (permalink / raw)
To: Patrick Steinhardt, Toon Claes; +Cc: git, gitster
Patrick Steinhardt <ps@pks.im> writes:
> I think Junio essentially asked this already, but I'm still missing the
> bigger picture here. When the "creationToken" heuristic is applied, the
> effect of your change is that we'll always favor bundle URIs now over
> performing proper fetches, right?
Yes, sort of. Bundle URIs are a step before "proper" fetches. A "proper"
fetch might still happen after downloading bundles.
(I would rather call them "negotiated fetches" or something, but let's
just stick with "proper" fetch for now)
> Now suppose that the server creates new bundled whenever somebody pushes
> a new change to the default branch. We do not really have information
> how this bundle is structured. It _could_ be an incremental bundle, and
> in that case it might be sensible to fetch that bundle. But it could
> also be that the server generates a full bundle including all objects
> transitively reachable from that default branch. Now if we started to
> rely on the "creationToken" heuristic, we would basically end up
> re-downloading the complete repository, which is a strict regression.
>
> Now that scenario is of course hypothetical. But the problem is that the
> strategy for how bundle URIs are generated are determined by the hosting
> provider. So ultimately, I expect that the reality will lie somewhere in
> between and be different depending on which hosting solution you use.
That is true. The mechanism of bundle URIs is mostly outside the control
of Git itself. It's up to the Git hosting provider how they use it. This
gives them a lot of flexibility, like where to store bundles, how
incremental bundles are used, and how often bundles are regenerated. But
this also comes with a huge responsibility, due to the scenario you
describe above.
At GitLab we want to do an initial roll-out of bundle URIs with just one
bundle advertised to the client. This bundle is generated every week,
and only contains the whole history of default branch (up to the newest
commit). But, we'll advertise this bundle with creationToken "1",
always. This will cause the client to fetch any bundle for the
repository only once. Even when the bundle is updated by the server, the
client will not fetch it because the creationToken did not change.
> All of this to me means that the "creationToken" heuristic is not really
> a good signal, unless I'm missing something about the way it works. Is
> there any additional signal provided by the server except for the time
> when the bundle was created?
The Git hosting provider can use the "creationToken" however they want,
at the moment it's up to them to decide on a good strategy.
For example, assume you decide to create bundle URIs only for the
default branch, then you can choose to use the committer date of the
topmost commit of that branch as the creationToken.
Imagine you have the following example commit history (committer date of
some commits are indicated with a caret below those commits):
A --- B --- C --- D --- E --- F --- G
^ ^ ^
2024-08-02 2024-08-04 2024-08-09
Today (on 2024-08-02, at this point commits D to G don't exist yet) the
Git host decides to create a bundle:
- git bundle create first.bundle main
The server will advertise this bundle with the creationToken being the
Unix timestamp of the topmost commit in the bundle:
- first.bundle = 2024-08-02 or in Unix 1722549600
When the client clones/fetches, it will download this bundle and store
creationToken 1722549600 (= 2024-08-02) in the repo config as
`fetch.bundleCreationToken`.
A few days later (on 2024-08-04, now D & E are added) the client fetches
again, and there are no new bundles, so a "proper" fetch happens.
A week after the first bundle (on 2024-08-09, D to G were added since)
the Git host decides to create a new (incremental) bundle:
- git bundle create second.bundle C..main
The server then advertises the following bundles:
- first.bundle = 2024-08-02 or in Unix 1722549600
- second.bundle = 2024-08-09 1723154400
Now when the client fetches again, it sees a new bundle "second.bundle",
and will download it. This bundle contains the history from C..G. And
you might have noticed, the client already has C..E, while it only needs
E..G. This is a slight inefficiency, it's a pitfall of bundle URI I
don't think we can avoid. By design bundle URIs are used to pre-generate
files for clients to use, so they are never built-to-order. It's almost
impossible to avoid clients to download bundles that have objects they
already have.
Well, maybe it is possible. But then the question arises, if the current
main branch is at G, do you want the client to download a bundle which
has a little bit too much objects, or do you want to have the client do
a "proper" fetch and skip any bundle?
But all the above assumes there's only one branch in the bundle. With
more branches another strategy might be required.
> If so, is that information sufficient to determine whether it makes
> sense for a client to fetch a bundle instead of performing a "proper"
> fetch? If not, what is the additional info that we would need to make
> this schema work properly?
I think my example above indicates the "creationToken" _can_ be
sufficient information for the client to determine if downloading a
bundle makes sense, but it depends on a well-though-out strategy of the
Git host. So one the hand it gives the host the flexibility of using a
strategy, on the other hand it puts a lot of responsibility on them.
> So unless I'm missing something, I feel like we need to think bigger and
> design a heuristic that gives us the information needed. Without such a
> heuristic, default-enabling may or may not do the right thing, and we
> have no way to really argue whether it will do as we now depend on
> server operators to do the right thing.
I think we're absolutely not ready to default-enabling the bundle URI
feature in Git. Here at GitLab we're trying pilot bundle URI to figure
out how it should/can work. Roll-out to CI runners is our first target
and because we have the flexibility to toggle use of bundle URI on the
client side, we can test things safely, without affecting any other
user.
Talking about heuristics, any heuristic will be a _hack_ if you ask me.
If you think about it, the bundles them self should be the single source
of truth. In fact, a bundle is a packfile with a header prepended that
contains refs and their OID in git-show-ref(1) format, and includes
which prerequisite OIDs it has.
So another approach I think of is to have the client partially download
bundles (so it pauses the download once they have received the complete
header) and make them parse the header to determine if they continue the
download. But this approach feels quite complicated to me.
Luckily, with the bundle.heuristic config we can always introduce new
values if we discover there is a flaw in the "creationToken" heuristic.
--
Toon
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/3] clone: remove double bundle list clear code
2024-07-26 21:52 ` Justin Tobler
@ 2024-08-02 15:45 ` Toon claes
0 siblings, 0 replies; 17+ messages in thread
From: Toon claes @ 2024-08-02 15:45 UTC (permalink / raw)
To: Justin Tobler, Toon Claes; +Cc: git, gitster
Justin Tobler <jltobler@gmail.com> writes:
> From the commit message though, I thought the added test might have
> something to do with the changes to when `transport->bundles` was freed.
> I wonder if it would be a good idea to expland on this further in the
> commit message. Or since the added test is somewhat unrelated to this
> patch, maybe it should be a separate patch?
I added this test case because I wanted to debug the code path where the
remote advertises bundle URIs, all other test cases use --bundle-uri.
This helped me to debug the code and understand where the
`transport->bundles` are freed. Although this test case does not fully
touch the code path of the code that's removed, but it was close enough
to hack a few extra lines and understand what happens. Since I wrote
that test anyway, I felt I might as well submit it.
--
Toon
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 3/3] fetch: use bundle URIs when having creationToken heuristic
2024-08-02 13:46 ` Toon claes
@ 2024-08-22 7:12 ` Patrick Steinhardt
0 siblings, 0 replies; 17+ messages in thread
From: Patrick Steinhardt @ 2024-08-22 7:12 UTC (permalink / raw)
To: Toon claes; +Cc: git, gitster
On Fri, Aug 02, 2024 at 03:46:39PM +0200, Toon claes wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > I think Junio essentially asked this already, but I'm still missing the
> > bigger picture here. When the "creationToken" heuristic is applied, the
> > effect of your change is that we'll always favor bundle URIs now over
> > performing proper fetches, right?
>
> Yes, sort of. Bundle URIs are a step before "proper" fetches. A "proper"
> fetch might still happen after downloading bundles.
> (I would rather call them "negotiated fetches" or something, but let's
> just stick with "proper" fetch for now)
>
> > Now suppose that the server creates new bundled whenever somebody pushes
> > a new change to the default branch. We do not really have information
> > how this bundle is structured. It _could_ be an incremental bundle, and
> > in that case it might be sensible to fetch that bundle. But it could
> > also be that the server generates a full bundle including all objects
> > transitively reachable from that default branch. Now if we started to
> > rely on the "creationToken" heuristic, we would basically end up
> > re-downloading the complete repository, which is a strict regression.
> >
> > Now that scenario is of course hypothetical. But the problem is that the
> > strategy for how bundle URIs are generated are determined by the hosting
> > provider. So ultimately, I expect that the reality will lie somewhere in
> > between and be different depending on which hosting solution you use.
>
> That is true. The mechanism of bundle URIs is mostly outside the control
> of Git itself. It's up to the Git hosting provider how they use it. This
> gives them a lot of flexibility, like where to store bundles, how
> incremental bundles are used, and how often bundles are regenerated. But
> this also comes with a huge responsibility, due to the scenario you
> describe above.
>
> At GitLab we want to do an initial roll-out of bundle URIs with just one
> bundle advertised to the client. This bundle is generated every week,
> and only contains the whole history of default branch (up to the newest
> commit). But, we'll advertise this bundle with creationToken "1",
> always. This will cause the client to fetch any bundle for the
> repository only once. Even when the bundle is updated by the server, the
> client will not fetch it because the creationToken did not change.
Okay. Are details like this documented anywhere in the Git project?
Because when we start to build on the preexisting heuristics I think
that we should give server operators a guide for how things come
together and what the client's expectations can and should be.
> > All of this to me means that the "creationToken" heuristic is not really
> > a good signal, unless I'm missing something about the way it works. Is
> > there any additional signal provided by the server except for the time
> > when the bundle was created?
>
> The Git hosting provider can use the "creationToken" however they want,
> at the moment it's up to them to decide on a good strategy.
>
> For example, assume you decide to create bundle URIs only for the
> default branch, then you can choose to use the committer date of the
> topmost commit of that branch as the creationToken.
>
> Imagine you have the following example commit history (committer date of
> some commits are indicated with a caret below those commits):
>
>
> A --- B --- C --- D --- E --- F --- G
> ^ ^ ^
> 2024-08-02 2024-08-04 2024-08-09
>
> Today (on 2024-08-02, at this point commits D to G don't exist yet) the
> Git host decides to create a bundle:
>
> - git bundle create first.bundle main
>
> The server will advertise this bundle with the creationToken being the
> Unix timestamp of the topmost commit in the bundle:
>
> - first.bundle = 2024-08-02 or in Unix 1722549600
>
> When the client clones/fetches, it will download this bundle and store
> creationToken 1722549600 (= 2024-08-02) in the repo config as
> `fetch.bundleCreationToken`.
>
> A few days later (on 2024-08-04, now D & E are added) the client fetches
> again, and there are no new bundles, so a "proper" fetch happens.
>
> A week after the first bundle (on 2024-08-09, D to G were added since)
> the Git host decides to create a new (incremental) bundle:
>
> - git bundle create second.bundle C..main
>
> The server then advertises the following bundles:
>
> - first.bundle = 2024-08-02 or in Unix 1722549600
> - second.bundle = 2024-08-09 1723154400
>
> Now when the client fetches again, it sees a new bundle "second.bundle",
> and will download it. This bundle contains the history from C..G. And
> you might have noticed, the client already has C..E, while it only needs
> E..G. This is a slight inefficiency, it's a pitfall of bundle URI I
> don't think we can avoid. By design bundle URIs are used to pre-generate
> files for clients to use, so they are never built-to-order. It's almost
> impossible to avoid clients to download bundles that have objects they
> already have.
>
> Well, maybe it is possible. But then the question arises, if the current
> main branch is at G, do you want the client to download a bundle which
> has a little bit too much objects, or do you want to have the client do
> a "proper" fetch and skip any bundle?
>
> But all the above assumes there's only one branch in the bundle. With
> more branches another strategy might be required.
Okay. What I'm still missing though is the ability for a client do
discern whether bundles are incremental or full.
In the schema you describe, my assumption is that the server operator
will eventually regenerate the first bundle to become a full bundle.
Which means that if there was any activity in the repository, then the
Unix timestamp of that regenerated bundle will likely be bigger than
what the client has stored on disk. Consequently, without information
that this is in fact a full bundle, the client would end up fetching it.
This is a potentially big waste of time and resources.
> > If so, is that information sufficient to determine whether it makes
> > sense for a client to fetch a bundle instead of performing a "proper"
> > fetch? If not, what is the additional info that we would need to make
> > this schema work properly?
>
> I think my example above indicates the "creationToken" _can_ be
> sufficient information for the client to determine if downloading a
> bundle makes sense, but it depends on a well-though-out strategy of the
> Git host. So one the hand it gives the host the flexibility of using a
> strategy, on the other hand it puts a lot of responsibility on them.
It does put a big burden on the hoster to get it right, that much is
true. But I think the bigger problem is that the hoster _cannot_ get it
right, because they have no ability to distribute enough information to
the client.
For a schema like this to work I think we need to advertise at least two
things for each bundle: a unique bundle token, and the token of the base
bundle that a bundle may refer to. This would allow us to advertise
dependencies between bundles to a client rather easily: a full bundle
does not have a base, whereas incremental bundles would point to the
previous bundle.
A theoretical flow would thus go like this:
1. The server announces the full bundle A.bundle. The client downloads
it and stores the unique token Tok(A) in its config.
2. Subsequent fetches notice that Tok(A) announced by the server and
stored in the config remain the same.
3. The server creates incremental bundles B.bundle and C.bundle.
B.bundle builds on top of A, C builds on top of B. The server
announces all three of those bundles. The client determines that it
only has A, so to get to C it needs to fetch both B and C. It then
stores Tok(C) in its config.
One scenario not yet covered is when the server regenerates the full
bundle to get A' = A + B + C + D. Ideally, the server would temporarily
announce a new bundle D = C + D while also announcing that A' and the
result from D are equivalent. Like this, the client knows that it does
not have to refetch A', but already has its contents and can thus update
its config to point to Tok(A').
But even a schema that is more involved like this one is rather
suboptimal. Eventually, the equivalence of Tok(A') and D will not be
advertised by the server anymore. So the client will not be able to
learn about which incremental bundles to fetch anymore, and would be
forced to re-download everything.
> > So unless I'm missing something, I feel like we need to think bigger and
> > design a heuristic that gives us the information needed. Without such a
> > heuristic, default-enabling may or may not do the right thing, and we
> > have no way to really argue whether it will do as we now depend on
> > server operators to do the right thing.
>
> I think we're absolutely not ready to default-enabling the bundle URI
> feature in Git. Here at GitLab we're trying pilot bundle URI to figure
> out how it should/can work. Roll-out to CI runners is our first target
> and because we have the flexibility to toggle use of bundle URI on the
> client side, we can test things safely, without affecting any other
> user.
>
> Talking about heuristics, any heuristic will be a _hack_ if you ask me.
> If you think about it, the bundles them self should be the single source
> of truth. In fact, a bundle is a packfile with a header prepended that
> contains refs and their OID in git-show-ref(1) format, and includes
> which prerequisite OIDs it has.
Yes, that is true indeed.
> So another approach I think of is to have the client partially download
> bundles (so it pauses the download once they have received the complete
> header) and make them parse the header to determine if they continue the
> download. But this approach feels quite complicated to me.
It certainly is more complicated. But it's also the only mechanism that
gives you all the information you need, I guess.
> Luckily, with the bundle.heuristic config we can always introduce new
> values if we discover there is a flaw in the "creationToken" heuristic.
Well, I'd claim we have already shown that it's flawed. I think before
we enable incremental bundle fetches by default, even if it's only with
the "creationToken" set, we need to show and document a way for server
operators to make use of this feature. In other words, we need to
document how they have to generate bundles in a way that does not cause
clients to re-download everything whenever a new bundle is announced.
Patrick
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] builtin/clone: teach git-clone(1) the --ref= argument
2024-07-22 8:07 [PATCH] fetch: use bundle URIs when having creationToken heuristic Toon Claes
2024-07-22 18:40 ` Junio C Hamano
2024-07-24 14:49 ` [PATCH v2 0/3] " Toon Claes
@ 2024-09-27 9:04 ` Toon Claes
2024-09-27 9:04 ` [PATCH v2] " Toon Claes
2 siblings, 1 reply; 17+ messages in thread
From: Toon Claes @ 2024-09-27 9:04 UTC (permalink / raw)
To: git; +Cc: Toon Claes
Hi,
My apologies, instantly after sending my v1 I noticed some lines in the test
weren't indented with tabs. So I'm sending v2 straight away. Sorry for the
noise.
--
Toon
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2] builtin/clone: teach git-clone(1) the --ref= argument
2024-09-27 9:04 ` [PATCH] builtin/clone: teach git-clone(1) the --ref= argument Toon Claes
@ 2024-09-27 9:04 ` Toon Claes
0 siblings, 0 replies; 17+ messages in thread
From: Toon Claes @ 2024-09-27 9:04 UTC (permalink / raw)
To: git; +Cc: Toon Claes
Add option `--ref` to git-clone(1). This enables the user to clone and
checkout the given named reference. It's pretty similar to --branch and
while --branch takes a branch name or tag name, it doesn't take a fully
qualified reference. This allows the user to clone a reference that
doesn't start with refs/heads or refs/tags. This can be useful when the
server uses custom references.
This new argument can be used in conjuction with --single-branch to only
clone the given ref.
Signed-off-by: Toon Claes <toon@iotcl.com>
---
Documentation/git-clone.txt | 9 ++++-
builtin/clone.c | 67 ++++++++++++++++++++++++++++++++-----
t/t5612-clone-refspec.sh | 35 +++++++++++++++++++
3 files changed, 102 insertions(+), 9 deletions(-)
diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 8e925db7e9..ae82f1c1c3 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -215,6 +215,13 @@ objects from the source repository into a pack in the cloned repository.
`--branch` can also take tags and detaches the HEAD at that commit
in the resulting repository.
+`--ref` _<name>_::
+ This detaches HEAD and makes it point to the commit where the _<name>_
+ reference is pointing to. In a non-bare repository, this is the ref that
+ will be checked out.
+ Can be used in conjunction with `--single-branch` and `--no-tags` to
+ clone only the given ref. Cannot be combined with `--branch`.
+
`-u` _<upload-pack>_::
`--upload-pack` _<upload-pack>_::
When given, and the repository to clone from is accessed
@@ -259,7 +266,7 @@ corresponding `--mirror` and `--no-tags` options instead.
`--`[`no-`]`single-branch`::
Clone only the history leading to the tip of a single branch,
- either specified by the `--branch` option or the primary
+ either specified by the `--branch` or `--ref` option or the primary
branch remote's `HEAD` points at.
Further fetches into the resulting repository will only update the
remote-tracking branch for the branch this option was used for the
diff --git a/builtin/clone.c b/builtin/clone.c
index e77339c847..1081478905 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -69,6 +69,7 @@ static char *option_template, *option_depth, *option_since;
static char *option_origin = NULL;
static char *remote_name = NULL;
static char *option_branch = NULL;
+static char *option_ref = NULL;
static struct string_list option_not = STRING_LIST_INIT_NODUP;
static const char *real_git_dir;
static const char *ref_format;
@@ -141,6 +142,8 @@ static struct option builtin_clone_options[] = {
N_("use <name> instead of 'origin' to track upstream")),
OPT_STRING('b', "branch", &option_branch, N_("branch"),
N_("checkout <branch> instead of the remote's HEAD")),
+ OPT_STRING(0, "ref", &option_ref, N_("ref"),
+ N_("checkout <ref> (detached) instead of the remote's HEAD")),
OPT_STRING('u', "upload-pack", &option_upload_pack, N_("path"),
N_("path to git-upload-pack on the remote")),
OPT_STRING(0, "depth", &option_depth, N_("depth"),
@@ -531,32 +534,64 @@ static struct ref *wanted_peer_refs(const struct ref *refs,
if (option_single_branch) {
struct ref *remote_head = NULL;
- if (!option_branch)
+ if (!option_branch && !option_ref)
remote_head = guess_remote_head(head, refs, 0);
else {
free_one_ref(head);
local_refs = head = NULL;
tail = &local_refs;
- remote_head = copy_ref(find_remote_branch(refs, option_branch));
+ if (option_branch)
+ remote_head = copy_ref(find_remote_branch(refs, option_branch));
+ else
+ remote_head = copy_ref(find_ref_by_name(refs, option_ref));
}
if (!remote_head && option_branch)
warning(_("Could not find remote branch %s to clone."),
option_branch);
+ else if (!remote_head && option_ref)
+ warning(_("Could not find remote ref %s to clone."),
+ option_ref);
else {
int i;
for (i = 0; i < refspec->nr; i++)
get_fetch_map(remote_head, &refspec->items[i],
&tail, 0);
- /* if --branch=tag, pull the requested tag explicitly */
- get_fetch_map(remote_head, &tag_refspec, &tail, 0);
+ if (option_ref) {
+ struct strbuf spec = STRBUF_INIT;
+ struct refspec_item ref_refspec;
+
+ strbuf_addf(&spec, "%s:%s", option_ref, option_ref);
+ refspec_item_init(&ref_refspec, spec.buf, 0);
+
+ get_fetch_map(remote_head, &ref_refspec, &tail, 0);
+
+ refspec_item_clear(&ref_refspec);
+ strbuf_release(&spec);
+ } else {
+ /* if --branch=tag, pull the requested tag explicitly */
+ get_fetch_map(remote_head, &tag_refspec, &tail, 0);
+ }
}
free_refs(remote_head);
} else {
int i;
for (i = 0; i < refspec->nr; i++)
get_fetch_map(refs, &refspec->items[i], &tail, 0);
+
+ if (option_ref) {
+ struct strbuf spec = STRBUF_INIT;
+ struct refspec_item ref_refspec;
+
+ strbuf_addf(&spec, "%s:%s", option_ref, option_ref);
+ refspec_item_init(&ref_refspec, spec.buf, 0);
+
+ get_fetch_map(refs, &ref_refspec, &tail, 0);
+
+ refspec_item_clear(&ref_refspec);
+ strbuf_release(&spec);
+ }
}
if (!option_mirror && !option_single_branch && !option_no_tags)
@@ -684,10 +719,15 @@ static void update_head(const struct ref *our, const struct ref *remote,
} else if (our) {
struct commit *c = lookup_commit_reference(the_repository,
&our->old_oid);
- /* --branch specifies a non-branch (i.e. tags), detach HEAD */
- refs_update_ref(get_main_ref_store(the_repository), msg,
- "HEAD", &c->object.oid, NULL, REF_NO_DEREF,
- UPDATE_REFS_DIE_ON_ERR);
+ if (c)
+ /* --branch specifies a non-branch (i.e. tags), detach HEAD */
+ refs_update_ref(get_main_ref_store(the_repository), msg,
+ "HEAD", &c->object.oid, NULL, REF_NO_DEREF,
+ UPDATE_REFS_DIE_ON_ERR);
+ else
+ refs_update_ref(get_main_ref_store(the_repository), msg,
+ "HEAD", &our->old_oid, NULL, REF_NO_DEREF,
+ UPDATE_REFS_DIE_ON_ERR);
} else if (remote) {
/*
* We know remote HEAD points to a non-branch, or
@@ -898,6 +938,9 @@ static void write_refspec_config(const char *src_ref_prefix,
else
strbuf_addf(&value, "+%s:%s%s", our_head_points_at->name,
branch_top->buf, option_branch);
+ } else if (option_ref) {
+ strbuf_addf(&value, "+%s:%s", our_head_points_at->name,
+ our_head_points_at->name);
} else if (remote_head_points_at) {
const char *head = remote_head_points_at->name;
if (!skip_prefix(head, "refs/heads/", &head))
@@ -1383,6 +1426,9 @@ int cmd_clone(int argc,
if (option_branch)
expand_ref_prefix(&transport_ls_refs_options.ref_prefixes,
option_branch);
+ if (option_ref)
+ strvec_push(&transport_ls_refs_options.ref_prefixes,
+ option_ref);
if (!option_no_tags)
strvec_push(&transport_ls_refs_options.ref_prefixes,
"refs/tags/");
@@ -1468,6 +1514,11 @@ int cmd_clone(int argc,
if (!our_head_points_at)
die(_("Remote branch %s not found in upstream %s"),
option_branch, remote_name);
+ } else if (option_ref) {
+ our_head_points_at = find_ref_by_name(mapped_refs, option_ref);
+ if (!our_head_points_at)
+ die(_("Remote ref %s not found in upstream %s"),
+ option_ref, remote_name);
} else if (remote_head_points_at) {
our_head_points_at = remote_head_points_at;
} else if (remote_head) {
diff --git a/t/t5612-clone-refspec.sh b/t/t5612-clone-refspec.sh
index 72762de977..bbdf66b7cb 100755
--- a/t/t5612-clone-refspec.sh
+++ b/t/t5612-clone-refspec.sh
@@ -17,6 +17,7 @@ test_expect_success 'setup' '
git tag two &&
echo three >file &&
git commit -a -m three &&
+ git update-ref refs/some/three HEAD &&
git checkout -b side &&
echo four >file &&
git commit -a -m four &&
@@ -236,4 +237,38 @@ test_expect_success '--single-branch with detached' '
test_must_be_empty actual
'
+test_expect_success 'with --ref' '
+ git clone --ref=refs/some/three . dir_ref &&
+ git -C dir_ref for-each-ref refs > refs &&
+ sed -e "/HEAD$/d" \
+ -e "s|/remotes/origin/|/heads/|" refs >actual &&
+ git for-each-ref refs >expect &&
+ test_cmp expect actual
+'
+
+test_expect_success 'with --ref and --no-tags' '
+ git clone --ref=refs/some/three --no-tags . dir_ref_notags &&
+ git -C dir_ref_notags for-each-ref refs > refs &&
+ sed -e "/HEAD$/d" \
+ -e "s|/remotes/origin/|/heads/|" refs >actual &&
+ git for-each-ref refs/heads >expect &&
+ git for-each-ref refs/some >>expect &&
+ test_cmp expect actual
+'
+
+test_expect_success '--single-branch with --ref' '
+ git clone --single-branch --ref=refs/some/three . dir_single_ref &&
+ git -C dir_single_ref for-each-ref refs > actual &&
+ git for-each-ref refs/some >expect &&
+ git for-each-ref refs/tags >>expect &&
+ test_cmp expect actual
+'
+
+test_expect_success '--single-branch with --ref and --no-tags' '
+ git clone --single-branch --ref=refs/some/three --no-tags . dir_single_ref_notags &&
+ git -C dir_single_ref_notags for-each-ref refs > actual &&
+ git for-each-ref refs/some >expect &&
+ test_cmp expect actual
+'
+
test_done
Range-diff against v1:
1: 8ee6a42431 ! 1: 07363b1de5 builtin/clone: teach git-clone(1) the --ref= argument
@@ Commit message
checkout the given named reference. It's pretty similar to --branch and
while --branch takes a branch name or tag name, it doesn't take a fully
qualified reference. This allows the user to clone a reference that
- doesn't start with refs/heads/ or refs/tags. This can be useful when the
+ doesn't start with refs/heads or refs/tags. This can be useful when the
server uses custom references.
- Allow the user to clone a certain ref, similar to --branch.
+ This new argument can be used in conjuction with --single-branch to only
+ clone the given ref.
Signed-off-by: Toon Claes <toon@iotcl.com>
@@ Documentation/git-clone.txt: objects from the source repository into a pack in t
+ reference is pointing to. In a non-bare repository, this is the ref that
+ will be checked out.
+ Can be used in conjunction with `--single-branch` and `--no-tags` to
-+ clone only given ref. Cannot be combined with `--branch`.
++ clone only the given ref. Cannot be combined with `--branch`.
+
`-u` _<upload-pack>_::
`--upload-pack` _<upload-pack>_::
@@ builtin/clone.c: static void update_head(const struct ref *our, const struct ref
+ else
+ refs_update_ref(get_main_ref_store(the_repository), msg,
+ "HEAD", &our->old_oid, NULL, REF_NO_DEREF,
-+ UPDATE_REFS_MSG_ON_ERR);
++ UPDATE_REFS_DIE_ON_ERR);
} else if (remote) {
/*
* We know remote HEAD points to a non-branch, or
@@ t/t5612-clone-refspec.sh: test_expect_success '--single-branch with detached' '
+test_expect_success 'with --ref' '
+ git clone --ref=refs/some/three . dir_ref &&
-+ git -C dir_ref for-each-ref refs > refs &&
-+ sed -e "/HEAD$/d" \
++ git -C dir_ref for-each-ref refs > refs &&
++ sed -e "/HEAD$/d" \
+ -e "s|/remotes/origin/|/heads/|" refs >actual &&
+ git for-each-ref refs >expect &&
+ test_cmp expect actual
@@ t/t5612-clone-refspec.sh: test_expect_success '--single-branch with detached' '
+
+test_expect_success 'with --ref and --no-tags' '
+ git clone --ref=refs/some/three --no-tags . dir_ref_notags &&
-+ git -C dir_ref_notags for-each-ref refs > refs &&
-+ sed -e "/HEAD$/d" \
++ git -C dir_ref_notags for-each-ref refs > refs &&
++ sed -e "/HEAD$/d" \
+ -e "s|/remotes/origin/|/heads/|" refs >actual &&
+ git for-each-ref refs/heads >expect &&
+ git for-each-ref refs/some >>expect &&
@@ t/t5612-clone-refspec.sh: test_expect_success '--single-branch with detached' '
+
+test_expect_success '--single-branch with --ref' '
+ git clone --single-branch --ref=refs/some/three . dir_single_ref &&
-+ git -C dir_single_ref for-each-ref refs > actual &&
++ git -C dir_single_ref for-each-ref refs > actual &&
+ git for-each-ref refs/some >expect &&
+ git for-each-ref refs/tags >>expect &&
+ test_cmp expect actual
@@ t/t5612-clone-refspec.sh: test_expect_success '--single-branch with detached' '
+
+test_expect_success '--single-branch with --ref and --no-tags' '
+ git clone --single-branch --ref=refs/some/three --no-tags . dir_single_ref_notags &&
-+ git -C dir_single_ref_notags for-each-ref refs > actual &&
++ git -C dir_single_ref_notags for-each-ref refs > actual &&
+ git for-each-ref refs/some >expect &&
+ test_cmp expect actual
+'
--
2.46.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-09-27 9:05 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-22 8:07 [PATCH] fetch: use bundle URIs when having creationToken heuristic Toon Claes
2024-07-22 18:40 ` Junio C Hamano
2024-07-24 14:49 ` [PATCH v2 0/3] " Toon Claes
2024-07-24 14:49 ` [PATCH v2 1/3] clone: remove double bundle list clear code Toon Claes
2024-07-26 8:51 ` Karthik Nayak
2024-07-26 21:52 ` Justin Tobler
2024-08-02 15:45 ` Toon claes
2024-07-24 14:49 ` [PATCH v2 2/3] transport: introduce transport_has_remote_bundle_uri() Toon Claes
2024-07-26 8:58 ` Karthik Nayak
2024-07-26 15:25 ` Junio C Hamano
2024-07-24 14:49 ` [PATCH v2 3/3] fetch: use bundle URIs when having creationToken heuristic Toon Claes
2024-07-26 9:06 ` Karthik Nayak
2024-07-26 12:50 ` Patrick Steinhardt
2024-08-02 13:46 ` Toon claes
2024-08-22 7:12 ` Patrick Steinhardt
2024-09-27 9:04 ` [PATCH] builtin/clone: teach git-clone(1) the --ref= argument Toon Claes
2024-09-27 9:04 ` [PATCH v2] " Toon Claes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).