git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Nikolay Edigaryev via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <stolee@gmail.com>,
	Nikolay Edigaryev <edigaryev@gmail.com>,
	Nikolay Edigaryev <edigaryev@gmail.com>
Subject: [PATCH] clone: support cloning of filtered bundles
Date: Sun, 14 Jan 2024 11:16:49 +0000	[thread overview]
Message-ID: <pull.1644.git.git.1705231010118.gitgitgadget@gmail.com> (raw)

From: Nikolay Edigaryev <edigaryev@gmail.com>

f18b512bbb (bundle: create filtered bundles, 2022-03-09) introduced an
incredibly useful ability to create filtered bundles, which advances
the partial clone/promisor support in Git and allows for archiving
large repositories to object storages like S3 in bundles that are:

* easy to manage
  * bundle is just a single file, it's easier to guarantee atomic
    replacements in object storages like S3 and they are faster to
    fetch than a bare repository since there's only a single GET
    request involved
* incredibly tiny
  * no indexes (which may be more than 10 MB for some repositories)
    and other fluff, compared to cloning a bare repository
  * bundle can be filtered to only contain the tips of refs neccessary
    for e.g. code-analysis purposes

However, in 86fdd94d72 (clone: fail gracefully when cloning filtered
bundle, 2022-03-09) the cloning of such bundles was disabled, with a
note that this behavior is not desired, and it the long-term this
should be possible.

The commit above states that it's not possible to have this at the
moment due to lack of remote and a repository-global config that
specifies an object filter, yet it's unclear why a remote-specific
config can't be used instead, which is what this change does.

Signed-off-by: Nikolay Edigaryev <edigaryev@gmail.com>
---
    clone: support cloning of filtered bundles
    
    f18b512bbb (bundle: create filtered bundles, 2022-03-09) introduced an
    incredibly useful ability to create filtered bundles, which advances the
    partial clone/promisor support in Git and allows for archiving large
    repositories to object storages like S3 in bundles that are:
    
     * easy to manage
       * bundle is just a single file, it's easier to guarantee atomic
         replacements in object storages like S3 and they are faster to
         fetch than a bare repository since there's only a single GET
         request involved
     * incredibly tiny
       * no indexes (which may be more than 10 MB for some repositories) and
         other fluff, compared to cloning a bare repository
       * bundle can be filtered to only contain the tips of refs neccessary
         for e.g. code-analysis purposes
    
    However, in 86fdd94d72 (clone: fail gracefully when cloning filtered
    bundle, 2022-03-09) the cloning of such bundles was disabled, with a
    note that this behavior is not desired, and it the long-term this should
    be possible.
    
    The commit above states that it's not possible to have this at the
    moment due to lack of remote and a repository-global config that
    specifies an object filter, yet it's unclear why a remote-specific
    config can't be used instead, which is what this change does.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1644%2Fedigaryev%2Fclone-filtered-bundles-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1644/edigaryev/clone-filtered-bundles-v1
Pull-Request: https://github.com/git/git/pull/1644

 builtin/clone.c        | 13 +++++++++++--
 t/t6020-bundle-misc.sh | 13 +++----------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index c6357af9498..4b3fedf78ed 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1227,9 +1227,18 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 
 		if (fd > 0)
 			close(fd);
+
+		if (has_filter) {
+			strbuf_addf(&key, "remote.%s.promisor", remote_name);
+			git_config_set(key.buf, "true");
+			strbuf_reset(&key);
+
+			strbuf_addf(&key, "remote.%s.partialclonefilter", remote_name);
+			git_config_set(key.buf, expand_list_objects_filter_spec(&header.filter));
+			strbuf_reset(&key);
+		}
+
 		bundle_header_release(&header);
-		if (has_filter)
-			die(_("cannot clone from filtered bundle"));
 	}
 
 	transport_set_option(transport, TRANS_OPT_KEEP, "yes");
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 3e6bcbf30cd..f449df00642 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -555,16 +555,9 @@ do
 	'
 done
 
-# NEEDSWORK: 'git clone --bare' should be able to clone from a filtered
-# bundle, but that requires a change to promisor/filter config options.
-# For now, we fail gracefully with a helpful error. This behavior can be
-# changed in the future to succeed as much as possible.
-test_expect_success 'cloning from filtered bundle has useful error' '
-	git bundle create partial.bdl \
-		--all \
-		--filter=blob:none &&
-	test_must_fail git clone --bare partial.bdl partial 2>err &&
-	grep "cannot clone from filtered bundle" err
+test_expect_success 'cloning from filtered bundle works' '
+	git bundle create partial.bdl --all --filter=blob:none &&
+	git clone --bare partial.bdl partial 2>err
 '
 
 test_expect_success 'verify catches unreachable, broken prerequisites' '

base-commit: 564d0252ca632e0264ed670534a51d18a689ef5d
-- 
gitgitgadget

             reply	other threads:[~2024-01-14 11:16 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-14 11:16 Nikolay Edigaryev via GitGitGadget [this message]
2024-01-14 18:00 ` [PATCH] clone: support cloning of filtered bundles Phillip Wood
2024-01-14 19:39   ` Nikolay Edigaryev
2024-01-14 21:26     ` Nikolay Edigaryev
2024-01-15 10:35       ` phillip.wood123
2024-01-15 10:18     ` phillip.wood123
2024-01-15  1:13 ` Junio C Hamano
2024-01-15  2:09   ` Junio C Hamano
2024-01-16 20:06     ` Nikolay Edigaryev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1644.git.git.1705231010118.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=edigaryev@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).