* [PATCH v3 08/11] bundle-uri: drop bundle.flag from design doc
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
To: git
Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
Derrick Stolee
In-Reply-To: <pull.1454.v3.git.1675171759.gitgitgadget@gmail.com>
From: Derrick Stolee <derrickstolee@github.com>
The Implementation Plan section lists a 'bundle.flag' option that is not
documented anywhere else. What is documented elsewhere in the document
and implemented by previous changes is the 'bundle.heuristic' config
key. For now, a heuristic is required to indicate that a bundle list is
organized for use during 'git fetch', and it is also sufficient for all
existing designs.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
Documentation/technical/bundle-uri.txt | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt
index b78d01d9adf..91d3a13e327 100644
--- a/Documentation/technical/bundle-uri.txt
+++ b/Documentation/technical/bundle-uri.txt
@@ -479,14 +479,14 @@ outline for submitting these features:
(This choice is an opt-in via a config option and a command-line
option.)
-4. Allow the client to understand the `bundle.flag=forFetch` configuration
+4. Allow the client to understand the `bundle.heuristic` configuration key
and the `bundle.<id>.creationToken` heuristic. When `git clone`
- discovers a bundle URI with `bundle.flag=forFetch`, it configures the
- client repository to check that bundle URI during later `git fetch <remote>`
+ discovers a bundle URI with `bundle.heuristic`, it configures the client
+ repository to check that bundle URI during later `git fetch <remote>`
commands.
5. Allow clients to discover bundle URIs during `git fetch` and configure
- a bundle URI for later fetches if `bundle.flag=forFetch`.
+ a bundle URI for later fetches if `bundle.heuristic` is set.
6. Implement the "inspect headers" heuristic to reduce data downloads when
the `bundle.<id>.creationToken` heuristic is not available.
--
gitgitgadget
^ permalink raw reply related
* [PATCH v3 06/11] bundle-uri: download in creationToken order
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
To: git
Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
Derrick Stolee
In-Reply-To: <pull.1454.v3.git.1675171759.gitgitgadget@gmail.com>
From: Derrick Stolee <derrickstolee@github.com>
The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.
The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.
During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.
Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.
However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:
---------------- bundle-4
4
/ \
----|---|------- bundle-3
| |
| 3
| |
----|---|------- bundle-2
| |
2 |
| |
----|---|------- bundle-1
\ /
1
|
(previous commits)
In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.
A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.
Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
bundle-uri.c | 156 +++++++++++++++++++++++++++++++++++-
t/t5558-clone-bundle-uri.sh | 40 +++++++--
t/t5601-clone.sh | 46 +++++++++++
3 files changed, 233 insertions(+), 9 deletions(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index d4277b2e3a7..af48938d243 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -447,6 +447,139 @@ static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data
return 0;
}
+struct bundles_for_sorting {
+ struct remote_bundle_info **items;
+ size_t alloc;
+ size_t nr;
+};
+
+static int append_bundle(struct remote_bundle_info *bundle, void *data)
+{
+ struct bundles_for_sorting *list = data;
+ list->items[list->nr++] = bundle;
+ return 0;
+}
+
+/**
+ * For use in QSORT() to get a list sorted by creationToken
+ * in decreasing order.
+ */
+static int compare_creation_token_decreasing(const void *va, const void *vb)
+{
+ const struct remote_bundle_info * const *a = va;
+ const struct remote_bundle_info * const *b = vb;
+
+ if ((*a)->creationToken > (*b)->creationToken)
+ return -1;
+ if ((*a)->creationToken < (*b)->creationToken)
+ return 1;
+ return 0;
+}
+
+static int fetch_bundles_by_token(struct repository *r,
+ struct bundle_list *list)
+{
+ int cur;
+ int move_direction = 0;
+ struct bundle_list_context ctx = {
+ .r = r,
+ .list = list,
+ .mode = list->mode,
+ };
+ struct bundles_for_sorting bundles = {
+ .alloc = hashmap_get_size(&list->bundles),
+ };
+
+ ALLOC_ARRAY(bundles.items, bundles.alloc);
+
+ for_all_bundles_in_list(list, append_bundle, &bundles);
+
+ QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
+
+ /*
+ * Attempt to download and unbundle the minimum number of bundles by
+ * creationToken in decreasing order. If we fail to unbundle (after
+ * a successful download) then move to the next non-downloaded bundle
+ * and attempt downloading. Once we succeed in applying a bundle,
+ * move to the previous unapplied bundle and attempt to unbundle it
+ * again.
+ *
+ * In the case of a fresh clone, we will likely download all of the
+ * bundles before successfully unbundling the oldest one, then the
+ * rest of the bundles unbundle successfully in increasing order
+ * of creationToken.
+ *
+ * If there are existing objects, then this process may terminate
+ * early when all required commits from "new" bundles exist in the
+ * repo's object store.
+ */
+ cur = 0;
+ while (cur >= 0 && cur < bundles.nr) {
+ struct remote_bundle_info *bundle = bundles.items[cur];
+ if (!bundle->file) {
+ /*
+ * Not downloaded yet. Try downloading.
+ *
+ * Note that bundle->file is non-NULL if a download
+ * was attempted, even if it failed to download.
+ */
+ if (fetch_bundle_uri_internal(ctx.r, bundle, ctx.depth + 1, ctx.list)) {
+ /* Mark as unbundled so we do not retry. */
+ bundle->unbundled = 1;
+
+ /* Try looking deeper in the list. */
+ move_direction = 1;
+ goto move;
+ }
+
+ /* We expect bundles when using creationTokens. */
+ if (!is_bundle(bundle->file, 1)) {
+ warning(_("file downloaded from '%s' is not a bundle"),
+ bundle->uri);
+ break;
+ }
+ }
+
+ if (bundle->file && !bundle->unbundled) {
+ /*
+ * This was downloaded, but not successfully
+ * unbundled. Try unbundling again.
+ */
+ if (unbundle_from_file(ctx.r, bundle->file)) {
+ /* Try looking deeper in the list. */
+ move_direction = 1;
+ } else {
+ /*
+ * Succeeded in unbundle. Retry bundles
+ * that previously failed to unbundle.
+ */
+ move_direction = -1;
+ bundle->unbundled = 1;
+ }
+ }
+
+ /*
+ * Else case: downloaded and unbundled successfully.
+ * Skip this by moving in the same direction as the
+ * previous step.
+ */
+
+move:
+ /* Move in the specified direction and repeat. */
+ cur += move_direction;
+ }
+
+ free(bundles.items);
+
+ /*
+ * We succeed if the loop terminates because 'cur' drops below
+ * zero. The other case is that we terminate because 'cur'
+ * reaches the end of the list, so we have a failure no matter
+ * which bundles we apply from the list.
+ */
+ return cur >= 0;
+}
+
static int download_bundle_list(struct repository *r,
struct bundle_list *local_list,
struct bundle_list *global_list,
@@ -484,7 +617,15 @@ static int fetch_bundle_list_in_config_format(struct repository *r,
goto cleanup;
}
- if ((result = download_bundle_list(r, &list_from_bundle,
+ /*
+ * If this list uses the creationToken heuristic, then the URIs
+ * it advertises are expected to be bundles, not nested lists.
+ * We can drop 'global_list' and 'depth'.
+ */
+ if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
+ result = fetch_bundles_by_token(r, &list_from_bundle);
+ global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN;
+ } else if ((result = download_bundle_list(r, &list_from_bundle,
global_list, depth)))
goto cleanup;
@@ -626,6 +767,14 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list)
int result;
struct bundle_list global_list;
+ /*
+ * If the creationToken heuristic is used, then the URIs
+ * advertised by 'list' are not nested lists and instead
+ * direct bundles. We do not need to use global_list.
+ */
+ if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
+ return fetch_bundles_by_token(r, list);
+
init_bundle_list(&global_list);
/* If a bundle is added to this global list, then it is required. */
@@ -634,7 +783,10 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list)
if ((result = download_bundle_list(r, list, &global_list, 0)))
goto cleanup;
- result = unbundle_all_bundles(r, &global_list);
+ if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
+ result = fetch_bundles_by_token(r, list);
+ else
+ result = unbundle_all_bundles(r, &global_list);
cleanup:
for_all_bundles_in_list(&global_list, unlink_bundle, NULL);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 474432c8ace..6f9417a0afb 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -401,17 +401,43 @@ test_expect_success 'clone bundle list (http, creationToken)' '
git -C clone-list-http-2 cat-file --batch-check <oids &&
cat >expect <<-EOF &&
- $HTTPD_URL/bundle-1.bundle
- $HTTPD_URL/bundle-2.bundle
- $HTTPD_URL/bundle-3.bundle
+ $HTTPD_URL/bundle-list
$HTTPD_URL/bundle-4.bundle
+ $HTTPD_URL/bundle-3.bundle
+ $HTTPD_URL/bundle-2.bundle
+ $HTTPD_URL/bundle-1.bundle
+ EOF
+
+ test_remote_https_urls <trace-clone.txt >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'clone incomplete bundle list (http, creationToken)' '
+ test_when_finished rm -f trace*.txt &&
+
+ cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+ cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = bundle-1.bundle
+ creationToken = 1
+ EOF
+
+ GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
+ git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+ --single-branch --branch=base --no-tags \
+ "$HTTPD_URL/smart/fetch.git" clone-token-http &&
+
+ cat >expect <<-EOF &&
$HTTPD_URL/bundle-list
+ $HTTPD_URL/bundle-1.bundle
EOF
- # Since the creationToken heuristic is not yet understood by the
- # client, the order cannot be verified at this moment. Sort the
- # list for consistent results.
- test_remote_https_urls <trace-clone.txt | sort >actual &&
+ test_remote_https_urls <trace-clone.txt >actual &&
test_cmp expect actual
'
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 1928ea1dd7c..b7d5551262c 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -831,6 +831,52 @@ test_expect_success 'auto-discover multiple bundles from HTTP clone' '
grep -f pattern trace.txt
'
+test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' '
+ test_when_finished rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
+ test_when_finished rm -rf clone-heuristic trace*.txt &&
+
+ test_commit -C src newest &&
+ git -C src bundle create "$HTTPD_DOCUMENT_ROOT_PATH/newest.bundle" HEAD~1..HEAD &&
+ git clone --bare --no-local src "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
+
+ cat >>"$HTTPD_DOCUMENT_ROOT_PATH/repo4.git/config" <<-EOF &&
+ [uploadPack]
+ advertiseBundleURIs = true
+
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "everything"]
+ uri = $HTTPD_URL/everything.bundle
+ creationtoken = 1
+
+ [bundle "new"]
+ uri = $HTTPD_URL/new.bundle
+ creationtoken = 2
+
+ [bundle "newest"]
+ uri = $HTTPD_URL/newest.bundle
+ creationtoken = 3
+ EOF
+
+ GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+ git -c protocol.version=2 \
+ -c transfer.bundleURI=true clone \
+ "$HTTPD_URL/smart/repo4.git" clone-heuristic &&
+
+ cat >expect <<-EOF &&
+ $HTTPD_URL/newest.bundle
+ $HTTPD_URL/new.bundle
+ $HTTPD_URL/everything.bundle
+ EOF
+
+ # We should fetch all bundles in the expected order.
+ test_remote_https_urls <trace-clone.txt >actual &&
+ test_cmp expect actual
+'
+
# DO NOT add non-httpd-specific tests here, because the last part of this
# test script is only executed when httpd is available and enabled.
--
gitgitgadget
^ permalink raw reply related
* [PATCH v3 07/11] clone: set fetch.bundleURI if appropriate
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
To: git
Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
Derrick Stolee
In-Reply-To: <pull.1454.v3.git.1675171759.gitgitgadget@gmail.com>
From: Derrick Stolee <derrickstolee@github.com>
Bundle providers may organize their bundle lists in a way that is
intended to improve incremental fetches, not just initial clones.
However, they do need to state that they have organized with that in
mind, or else the client will not expect to save time by downloading
bundles after the initial clone. This is done by specifying a
bundle.heuristic value.
There are two types of bundle lists: those at a static URI and those
that are advertised from a Git remote over protocol v2.
The new fetch.bundleURI config value applies for static bundle URIs that
are not advertised over protocol v2. If the user specifies a static URI
via 'git clone --bundle-uri', then Git can set this config as a reminder
for future 'git fetch' operations to check the bundle list before
connecting to the remote(s).
For lists provided over protocol v2, we will want to take a different
approach and create a property of the remote itself by creating a
remote.<id>.* type config key. That is not implemented in this change.
Later changes will update 'git fetch' to consume this option.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
Documentation/config/fetch.txt | 8 +++++++
builtin/clone.c | 6 +++++-
bundle-uri.c | 5 ++++-
bundle-uri.h | 8 ++++++-
t/t5558-clone-bundle-uri.sh | 39 ++++++++++++++++++++++++++++++++++
5 files changed, 63 insertions(+), 3 deletions(-)
diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
index cd65d236b43..244f44d460f 100644
--- a/Documentation/config/fetch.txt
+++ b/Documentation/config/fetch.txt
@@ -96,3 +96,11 @@ fetch.writeCommitGraph::
merge and the write may take longer. Having an updated commit-graph
file helps performance of many Git commands, including `git merge-base`,
`git push -f`, and `git log --graph`. Defaults to false.
+
+fetch.bundleURI::
+ This value stores a URI for downloading Git object data from a bundle
+ URI before performing an incremental fetch from the origin Git server.
+ This is similar to how the `--bundle-uri` option behaves in
+ linkgit:git-clone[1]. `git clone --bundle-uri` will set the
+ `fetch.bundleURI` value if the supplied bundle URI contains a bundle
+ list that is organized for incremental fetches.
diff --git a/builtin/clone.c b/builtin/clone.c
index 5453ba5277f..5370617664d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1248,12 +1248,16 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
* data from the --bundle-uri option.
*/
if (bundle_uri) {
+ int has_heuristic = 0;
+
/* At this point, we need the_repository to match the cloned repo. */
if (repo_init(the_repository, git_dir, work_tree))
warning(_("failed to initialize the repo, skipping bundle URI"));
- else if (fetch_bundle_uri(the_repository, bundle_uri))
+ else if (fetch_bundle_uri(the_repository, bundle_uri, &has_heuristic))
warning(_("failed to fetch objects from bundle URI '%s'"),
bundle_uri);
+ else if (has_heuristic)
+ git_config_set_gently("fetch.bundleuri", bundle_uri);
}
strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
diff --git a/bundle-uri.c b/bundle-uri.c
index af48938d243..7a1b6d94bf5 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -736,7 +736,8 @@ static int unlink_bundle(struct remote_bundle_info *info, void *data)
return 0;
}
-int fetch_bundle_uri(struct repository *r, const char *uri)
+int fetch_bundle_uri(struct repository *r, const char *uri,
+ int *has_heuristic)
{
int result;
struct bundle_list list;
@@ -756,6 +757,8 @@ int fetch_bundle_uri(struct repository *r, const char *uri)
result = unbundle_all_bundles(r, &list);
cleanup:
+ if (has_heuristic)
+ *has_heuristic = (list.heuristic != BUNDLE_HEURISTIC_NONE);
for_all_bundles_in_list(&list, unlink_bundle, NULL);
clear_bundle_list(&list);
clear_remote_bundle_info(&bundle, NULL);
diff --git a/bundle-uri.h b/bundle-uri.h
index ef32840bfa6..6dbc780f661 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -124,8 +124,14 @@ int bundle_uri_parse_config_format(const char *uri,
* based on that information.
*
* Returns non-zero if no bundle information is found at the given 'uri'.
+ *
+ * If the pointer 'has_heuristic' is non-NULL, then the value it points to
+ * will be set to be non-zero if and only if the fetched list has a
+ * heuristic value. Such a value indicates that the list was designed for
+ * incremental fetches.
*/
-int fetch_bundle_uri(struct repository *r, const char *uri);
+int fetch_bundle_uri(struct repository *r, const char *uri,
+ int *has_heuristic);
/**
* Given a bundle list that was already advertised (likely by the
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 6f9417a0afb..b2d15e141ca 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -432,6 +432,8 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
--single-branch --branch=base --no-tags \
"$HTTPD_URL/smart/fetch.git" clone-token-http &&
+ test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+
cat >expect <<-EOF &&
$HTTPD_URL/bundle-list
$HTTPD_URL/bundle-1.bundle
@@ -441,6 +443,43 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
test_cmp expect actual
'
+test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
+ test_when_finished rm -rf fetch-http-4 trace*.txt &&
+
+ cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = bundle-1.bundle
+ creationToken = 1
+ EOF
+
+ GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+ git clone --single-branch --branch=base \
+ --bundle-uri="$HTTPD_URL/bundle-list" \
+ "$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
+
+ test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+
+ cat >expect <<-EOF &&
+ $HTTPD_URL/bundle-list
+ $HTTPD_URL/bundle-1.bundle
+ EOF
+
+ test_remote_https_urls <trace-clone.txt >actual &&
+ test_cmp expect actual &&
+
+ # only received base ref from bundle-1
+ git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+ cat >expect <<-\EOF &&
+ refs/bundles/base
+ EOF
+ test_cmp expect refs
+'
+
# Do not add tests here unless they use the HTTP server, as they will
# not run unless the HTTP dependencies exist.
--
gitgitgadget
^ permalink raw reply related
* [PATCH v3 05/11] bundle-uri: parse bundle.<id>.creationToken values
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
To: git
Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
Derrick Stolee
In-Reply-To: <pull.1454.v3.git.1675171759.gitgitgadget@gmail.com>
From: Derrick Stolee <derrickstolee@github.com>
The previous change taught Git to parse the bundle.heuristic value,
especially when its value is "creationToken". Now, teach Git to parse
the bundle.<id>.creationToken values on each bundle in a bundle list.
Before implementing any logic based on creationToken values for the
creationToken heuristic, parse and print these values for testing
purposes.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
bundle-uri.c | 10 ++++++++++
bundle-uri.h | 6 ++++++
t/t5750-bundle-uri-parse.sh | 18 ++++++++++++++++++
3 files changed, 34 insertions(+)
diff --git a/bundle-uri.c b/bundle-uri.c
index 36ec542718d..d4277b2e3a7 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -83,6 +83,9 @@ static int summarize_bundle(struct remote_bundle_info *info, void *data)
FILE *fp = data;
fprintf(fp, "[bundle \"%s\"]\n", info->id);
fprintf(fp, "\turi = %s\n", info->uri);
+
+ if (info->creationToken)
+ fprintf(fp, "\tcreationToken = %"PRIu64"\n", info->creationToken);
return 0;
}
@@ -203,6 +206,13 @@ static int bundle_list_update(const char *key, const char *value,
return 0;
}
+ if (!strcmp(subkey, "creationtoken")) {
+ if (sscanf(value, "%"PRIu64, &bundle->creationToken) != 1)
+ warning(_("could not parse bundle list key %s with value '%s'"),
+ "creationToken", value);
+ return 0;
+ }
+
/*
* At this point, we ignore any information that we don't
* understand, assuming it to be hints for a heuristic the client
diff --git a/bundle-uri.h b/bundle-uri.h
index 2e44a50a90b..ef32840bfa6 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -42,6 +42,12 @@ struct remote_bundle_info {
* this boolean is true.
*/
unsigned unbundled:1;
+
+ /**
+ * If the bundle is part of a list with the creationToken
+ * heuristic, then we use this member for sorting the bundles.
+ */
+ uint64_t creationToken;
};
#define REMOTE_BUNDLE_INFO_INIT { 0 }
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index 6fc92a9c0d4..81bdf58b944 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -258,10 +258,13 @@ test_expect_success 'parse config format: creationToken heuristic' '
heuristic = creationToken
[bundle "one"]
uri = http://example.com/bundle.bdl
+ creationToken = 123456
[bundle "two"]
uri = https://example.com/bundle.bdl
+ creationToken = 12345678901234567890
[bundle "three"]
uri = file:///usr/share/git/bundle.bdl
+ creationToken = 1
EOF
test-tool bundle-uri parse-config expect >actual 2>err &&
@@ -269,4 +272,19 @@ test_expect_success 'parse config format: creationToken heuristic' '
test_cmp_config_output expect actual
'
+test_expect_success 'parse config format edge cases: creationToken heuristic' '
+ cat >expect <<-\EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+ [bundle "one"]
+ uri = http://example.com/bundle.bdl
+ creationToken = bogus
+ EOF
+
+ test-tool bundle-uri parse-config expect >actual 2>err &&
+ grep "could not parse bundle list key creationToken with value '\''bogus'\''" err
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related
* [PATCH v3 04/11] bundle-uri: parse bundle.heuristic=creationToken
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
To: git
Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
Derrick Stolee
In-Reply-To: <pull.1454.v3.git.1675171759.gitgitgadget@gmail.com>
From: Derrick Stolee <derrickstolee@github.com>
The bundle.heuristic value communicates that the bundle list is
organized to make use of the bundle.<id>.creationToken values that may
be provided in the bundle list. Those values will create a total order
on the bundles, allowing the Git client to download them in a specific
order and even remember previously-downloaded bundles by storing the
maximum creation token value.
Before implementing any logic that parses or uses the
bundle.<id>.creationToken values, teach Git to parse the
bundle.heuristic value from a bundle list. We can use 'test-tool
bundle-uri' to print the heuristic value and verify that the parsing
works correctly.
As an extra precaution, create the internal 'heuristics' array to be a
list of (enum, string) pairs so we can iterate through the array entries
carefully, regardless of the enum values.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
Documentation/config/bundle.txt | 7 +++++++
bundle-uri.c | 34 +++++++++++++++++++++++++++++++++
bundle-uri.h | 14 ++++++++++++++
t/t5750-bundle-uri-parse.sh | 19 ++++++++++++++++++
4 files changed, 74 insertions(+)
diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt
index daa21eb674a..3faae386853 100644
--- a/Documentation/config/bundle.txt
+++ b/Documentation/config/bundle.txt
@@ -15,6 +15,13 @@ bundle.mode::
complete understanding of the bundled information (`all`) or if any one
of the listed bundle URIs is sufficient (`any`).
+bundle.heuristic::
+ If this string-valued key exists, then the bundle list is designed to
+ work well with incremental `git fetch` commands. The heuristic signals
+ that there are additional keys available for each bundle that help
+ determine which subset of bundles the client should download. The
+ only value currently understood is `creationToken`.
+
bundle.<id>.*::
The `bundle.<id>.*` keys are used to describe a single item in the
bundle list, grouped under `<id>` for identification purposes.
diff --git a/bundle-uri.c b/bundle-uri.c
index 36268dda172..36ec542718d 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -9,6 +9,14 @@
#include "config.h"
#include "remote.h"
+static struct {
+ enum bundle_list_heuristic heuristic;
+ const char *name;
+} heuristics[BUNDLE_HEURISTIC__COUNT] = {
+ { BUNDLE_HEURISTIC_NONE, ""},
+ { BUNDLE_HEURISTIC_CREATIONTOKEN, "creationToken" },
+};
+
static int compare_bundles(const void *hashmap_cmp_fn_data,
const struct hashmap_entry *he1,
const struct hashmap_entry *he2,
@@ -100,6 +108,17 @@ void print_bundle_list(FILE *fp, struct bundle_list *list)
fprintf(fp, "\tversion = %d\n", list->version);
fprintf(fp, "\tmode = %s\n", mode);
+ if (list->heuristic) {
+ int i;
+ for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
+ if (heuristics[i].heuristic == list->heuristic) {
+ printf("\theuristic = %s\n",
+ heuristics[list->heuristic].name);
+ break;
+ }
+ }
+ }
+
for_all_bundles_in_list(list, summarize_bundle, fp);
}
@@ -142,6 +161,21 @@ static int bundle_list_update(const char *key, const char *value,
return 0;
}
+ if (!strcmp(subkey, "heuristic")) {
+ int i;
+ for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
+ if (heuristics[i].heuristic &&
+ heuristics[i].name &&
+ !strcmp(value, heuristics[i].name)) {
+ list->heuristic = heuristics[i].heuristic;
+ return 0;
+ }
+ }
+
+ /* Ignore unknown heuristics. */
+ return 0;
+ }
+
/* Ignore other unknown global keys. */
return 0;
}
diff --git a/bundle-uri.h b/bundle-uri.h
index d5e89f1671c..2e44a50a90b 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -52,6 +52,14 @@ enum bundle_list_mode {
BUNDLE_MODE_ANY
};
+enum bundle_list_heuristic {
+ BUNDLE_HEURISTIC_NONE = 0,
+ BUNDLE_HEURISTIC_CREATIONTOKEN,
+
+ /* Must be last. */
+ BUNDLE_HEURISTIC__COUNT
+};
+
/**
* A bundle_list contains an unordered set of remote_bundle_info structs,
* as well as information about the bundle listing, such as version and
@@ -75,6 +83,12 @@ struct bundle_list {
* advertised by the bundle list at that location.
*/
char *baseURI;
+
+ /**
+ * A list can have a heuristic, which helps reduce the number of
+ * downloaded bundles.
+ */
+ enum bundle_list_heuristic heuristic;
};
void init_bundle_list(struct bundle_list *list);
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index 7b4f930e532..6fc92a9c0d4 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -250,4 +250,23 @@ test_expect_success 'parse config format edge cases: empty key or value' '
test_cmp_config_output expect actual
'
+test_expect_success 'parse config format: creationToken heuristic' '
+ cat >expect <<-\EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+ [bundle "one"]
+ uri = http://example.com/bundle.bdl
+ [bundle "two"]
+ uri = https://example.com/bundle.bdl
+ [bundle "three"]
+ uri = file:///usr/share/git/bundle.bdl
+ EOF
+
+ test-tool bundle-uri parse-config expect >actual 2>err &&
+ test_must_be_empty err &&
+ test_cmp_config_output expect actual
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related
* [PATCH v3 03/11] t5558: add tests for creationToken heuristic
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
To: git
Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
Derrick Stolee
In-Reply-To: <pull.1454.v3.git.1675171759.gitgitgadget@gmail.com>
From: Derrick Stolee <derrickstolee@github.com>
As documented in the bundle URI design doc in 2da14fad8fe (docs:
document bundle URI standard, 2022-08-09), the 'creationToken' member of
a bundle URI allows a bundle provider to specify a total order on the
bundles.
Future changes will allow the Git client to understand these members and
modify its behavior around downloading the bundles in that order. In the
meantime, create tests that add creation tokens to the bundle list. For
now, the Git client correctly ignores these unknown keys.
Create a new test helper function, test_remote_https_urls, which filters
GIT_TRACE2_EVENT output to extract a list of URLs passed to
git-remote-https child processes. This can be used to verify the order
of these requests as we implement the creationToken heuristic. For now,
we need to sort the actual output since the current client does not have
a well-defined order that it applies to the bundles.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
t/t5558-clone-bundle-uri.sh | 69 +++++++++++++++++++++++++++++++++++--
t/test-lib-functions.sh | 8 +++++
2 files changed, 75 insertions(+), 2 deletions(-)
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 9155f31fa2c..474432c8ace 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -285,6 +285,8 @@ test_expect_success 'clone HTTP bundle' '
'
test_expect_success 'clone bundle list (HTTP, no heuristic)' '
+ test_when_finished rm -f trace*.txt &&
+
cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle]
@@ -304,12 +306,26 @@ test_expect_success 'clone bundle list (HTTP, no heuristic)' '
uri = $HTTPD_URL/bundle-4.bundle
EOF
- git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+ GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+ git clone --bundle-uri="$HTTPD_URL/bundle-list" \
clone-from clone-list-http 2>err &&
! grep "Repository lacks these prerequisite commits" err &&
git -C clone-from for-each-ref --format="%(objectname)" >oids &&
- git -C clone-list-http cat-file --batch-check <oids
+ git -C clone-list-http cat-file --batch-check <oids &&
+
+ cat >expect <<-EOF &&
+ $HTTPD_URL/bundle-1.bundle
+ $HTTPD_URL/bundle-2.bundle
+ $HTTPD_URL/bundle-3.bundle
+ $HTTPD_URL/bundle-4.bundle
+ $HTTPD_URL/bundle-list
+ EOF
+
+ # Sort the list, since the order is not well-defined
+ # without a heuristic.
+ test_remote_https_urls <trace-clone.txt | sort >actual &&
+ test_cmp expect actual
'
test_expect_success 'clone bundle list (HTTP, any mode)' '
@@ -350,6 +366,55 @@ test_expect_success 'clone bundle list (HTTP, any mode)' '
test_cmp expect actual
'
+test_expect_success 'clone bundle list (http, creationToken)' '
+ test_when_finished rm -f trace*.txt &&
+
+ cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+ cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+ [bundle]
+ version = 1
+ mode = all
+ heuristic = creationToken
+
+ [bundle "bundle-1"]
+ uri = bundle-1.bundle
+ creationToken = 1
+
+ [bundle "bundle-2"]
+ uri = bundle-2.bundle
+ creationToken = 2
+
+ [bundle "bundle-3"]
+ uri = bundle-3.bundle
+ creationToken = 3
+
+ [bundle "bundle-4"]
+ uri = bundle-4.bundle
+ creationToken = 4
+ EOF
+
+ GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" git \
+ clone --bundle-uri="$HTTPD_URL/bundle-list" \
+ "$HTTPD_URL/smart/fetch.git" clone-list-http-2 &&
+
+ git -C clone-from for-each-ref --format="%(objectname)" >oids &&
+ git -C clone-list-http-2 cat-file --batch-check <oids &&
+
+ cat >expect <<-EOF &&
+ $HTTPD_URL/bundle-1.bundle
+ $HTTPD_URL/bundle-2.bundle
+ $HTTPD_URL/bundle-3.bundle
+ $HTTPD_URL/bundle-4.bundle
+ $HTTPD_URL/bundle-list
+ EOF
+
+ # Since the creationToken heuristic is not yet understood by the
+ # client, the order cannot be verified at this moment. Sort the
+ # list for consistent results.
+ test_remote_https_urls <trace-clone.txt | sort >actual &&
+ test_cmp expect actual
+'
+
# Do not add tests here unless they use the HTTP server, as they will
# not run unless the HTTP dependencies exist.
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index f036c4d3003..ace542f4226 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1833,6 +1833,14 @@ test_region () {
return 0
}
+# Given a GIT_TRACE2_EVENT log over stdin, writes to stdout a list of URLs
+# sent to git-remote-https child processes.
+test_remote_https_urls() {
+ grep -e '"event":"child_start".*"argv":\["git-remote-https",".*"\]' |
+ sed -e 's/{"event":"child_start".*"argv":\["git-remote-https","//g' \
+ -e 's/"\]}//g'
+}
+
# Print the destination of symlink(s) provided as arguments. Basically
# the same as the readlink command, but it's not available everywhere.
test_readlink () {
--
gitgitgadget
^ permalink raw reply related
* [PATCH v3 02/11] bundle: verify using check_connected()
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
To: git
Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
Derrick Stolee
In-Reply-To: <pull.1454.v3.git.1675171759.gitgitgadget@gmail.com>
From: Derrick Stolee <derrickstolee@github.com>
When Git verifies a bundle to see if it is safe for unbundling, it first
looks to see if the prerequisite commits are in the object store. This
is an easy way to "fail fast" but it is not a sufficient check for
updating refs that guarantee closure under reachability. There could
still be issues if those commits are not reachable from the repository's
references. The repository only has guarantees that its object store is
closed under reachability for the objects that are reachable from
references.
Thus, the code in verify_bundle() has previously had the additional
check that all prerequisite commits are reachable from repository
references. This is done via a revision walk from all references,
stopping only if all prerequisite commits are discovered or all commits
are walked. This uses a custom walk to verify_bundle().
This check is more strict than what Git applies to fetched pack-files.
In the fetch case, Git guarantees that the new references are closed
under reachability by walking from the new references until walking
commits that are reachable from repository refs. This is done through
the well-used check_connected() method.
To better align with the restrictions required by 'git fetch',
reimplement this check in verify_bundle() to use check_connected(). This
also simplifies the code significantly.
The previous change added a test that verified the behavior of 'git
bundle verify' and 'git bundle unbundle' in this case, and the error
messages looked like this:
error: Could not read <missing-commit>
fatal: Failed to traverse parents of commit <extant-commit>
However, by changing the revision walk slightly within check_connected()
and using its quiet mode, we can omit those messages. Instead, we get
only this message, tailored to describing the current state of the
repository:
error: some prerequisite commits exist in the object store,
but are not connected to the repository's history
(Line break added here for the commit message formatting, only.)
While this message does not include any object IDs, there is no
guarantee that those object IDs would help the user diagnose what is
going on, as they could be separated from the prerequisite commits by
some distance. At minimum, this situation describes the situation in a
more informative way than the previous error messages.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
bundle.c | 75 ++++++++++++++++--------------------------
t/t6020-bundle-misc.sh | 8 ++---
2 files changed, 33 insertions(+), 50 deletions(-)
diff --git a/bundle.c b/bundle.c
index 4ef7256aa11..76c3a904898 100644
--- a/bundle.c
+++ b/bundle.c
@@ -12,6 +12,7 @@
#include "refs.h"
#include "strvec.h"
#include "list-objects-filter-options.h"
+#include "connected.h"
static const char v2_bundle_signature[] = "# v2 git bundle\n";
static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ -187,6 +188,21 @@ static int list_refs(struct string_list *r, int argc, const char **argv)
/* Remember to update object flag allocation in object.h */
#define PREREQ_MARK (1u<<16)
+struct string_list_iterator {
+ struct string_list *list;
+ size_t cur;
+};
+
+static const struct object_id *iterate_ref_map(void *cb_data)
+{
+ struct string_list_iterator *iter = cb_data;
+
+ if (iter->cur >= iter->list->nr)
+ return NULL;
+
+ return iter->list->items[iter->cur++].util;
+}
+
int verify_bundle(struct repository *r,
struct bundle_header *header,
enum verify_bundle_flags flags)
@@ -196,26 +212,25 @@ int verify_bundle(struct repository *r,
* to be verbose about the errors
*/
struct string_list *p = &header->prerequisites;
- struct rev_info revs = REV_INFO_INIT;
- const char *argv[] = {NULL, "--all", NULL};
- struct commit *commit;
- int i, ret = 0, req_nr;
+ int i, ret = 0;
const char *message = _("Repository lacks these prerequisite commits:");
+ struct string_list_iterator iter = {
+ .list = p,
+ };
+ struct check_connected_options opts = {
+ .quiet = 1,
+ };
if (!r || !r->objects || !r->objects->odb)
return error(_("need a repository to verify a bundle"));
- repo_init_revisions(r, &revs, NULL);
for (i = 0; i < p->nr; i++) {
struct string_list_item *e = p->items + i;
const char *name = e->string;
struct object_id *oid = e->util;
struct object *o = parse_object(r, oid);
- if (o) {
- o->flags |= PREREQ_MARK;
- add_pending_object(&revs, o, name);
+ if (o)
continue;
- }
ret++;
if (flags & VERIFY_BUNDLE_QUIET)
continue;
@@ -223,37 +238,14 @@ int verify_bundle(struct repository *r,
error("%s", message);
error("%s %s", oid_to_hex(oid), name);
}
- if (revs.pending.nr != p->nr)
+ if (ret)
goto cleanup;
- req_nr = revs.pending.nr;
- setup_revisions(2, argv, &revs, NULL);
-
- list_objects_filter_copy(&revs.filter, &header->filter);
-
- if (prepare_revision_walk(&revs))
- die(_("revision walk setup failed"));
- i = req_nr;
- while (i && (commit = get_revision(&revs)))
- if (commit->object.flags & PREREQ_MARK)
- i--;
-
- for (i = 0; i < p->nr; i++) {
- struct string_list_item *e = p->items + i;
- const char *name = e->string;
- const struct object_id *oid = e->util;
- struct object *o = parse_object(r, oid);
- assert(o); /* otherwise we'd have returned early */
- if (o->flags & SHOWN)
- continue;
- ret++;
- if (flags & VERIFY_BUNDLE_QUIET)
- continue;
- if (ret == 1)
- error("%s", message);
- error("%s %s", oid_to_hex(oid), name);
- }
+ if ((ret = check_connected(iterate_ref_map, &iter, &opts)))
+ error(_("some prerequisite commits exist in the object store, "
+ "but are not connected to the repository's history"));
+ /* TODO: preserve this verbose language. */
if (flags & VERIFY_BUNDLE_VERBOSE) {
struct string_list *r;
@@ -282,15 +274,6 @@ int verify_bundle(struct repository *r,
list_objects_filter_spec(&header->filter));
}
cleanup:
- /* Clean up objects used, as they will be reused. */
- for (i = 0; i < p->nr; i++) {
- struct string_list_item *e = p->items + i;
- struct object_id *oid = e->util;
- commit = lookup_commit_reference_gently(r, oid, 1);
- if (commit)
- clear_commit_marks(commit, ALL_REV_FLAGS | PREREQ_MARK);
- }
- release_revisions(&revs);
return ret;
}
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 38dbbf89155..7d40994991e 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -595,14 +595,14 @@ test_expect_success 'verify catches unreachable, broken prerequisites' '
# Verify should fail
test_must_fail git bundle verify \
../clone-from/tip.bundle 2>err &&
- grep "Could not read $BAD_OID" err &&
- grep "Failed to traverse parents of commit $TIP_OID" err &&
+ grep "some prerequisite commits .* are not connected" err &&
+ test_line_count = 1 err &&
# Unbundling should fail
test_must_fail git bundle unbundle \
../clone-from/tip.bundle 2>err &&
- grep "Could not read $BAD_OID" err &&
- grep "Failed to traverse parents of commit $TIP_OID" err
+ grep "some prerequisite commits .* are not connected" err &&
+ test_line_count = 1 err
)
'
--
gitgitgadget
^ permalink raw reply related
* [PATCH v3 01/11] bundle: test unbundling with incomplete history
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
To: git
Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
Derrick Stolee
In-Reply-To: <pull.1454.v3.git.1675171759.gitgitgadget@gmail.com>
From: Derrick Stolee <derrickstolee@github.com>
When verifying a bundle, Git checks first that all prerequisite commits
exist in the object store, then adds an additional check: those
prerequisite commits must be reachable from references in the
repository.
This check is stronger than what is checked for refs being added during
'git fetch', which simply guarantees that the new refs have a complete
history up to the point where it intersects with the current reachable
history.
However, we also do not have any tests that check the behavior under
this condition. Create a test that demonstrates its behavior.
In order to construct a broken history, perform a shallow clone of a
repository with a linear history, but whose default branch ('base') has
a single commit, so dropping the shallow markers leaves a complete
history from that reference. However, the 'tip' reference adds a
shallow commit whose parent is missing in the cloned repository. Trying
to unbundle a bundle with the 'tip' as a prerequisite will succeed past
the object store check and move into the reachability check.
The two errors that are reported are of this form:
error: Could not read <missing-commit>
fatal: Failed to traverse parents of commit <present-commit>
These messages are not particularly helpful for the person running the
unbundle command, but they do prevent the command from succeeding.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
t/t6020-bundle-misc.sh | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
index 3a1cf30b1d7..38dbbf89155 100755
--- a/t/t6020-bundle-misc.sh
+++ b/t/t6020-bundle-misc.sh
@@ -566,4 +566,44 @@ test_expect_success 'cloning from filtered bundle has useful error' '
grep "cannot clone from filtered bundle" err
'
+test_expect_success 'verify catches unreachable, broken prerequisites' '
+ test_when_finished rm -rf clone-from clone-to &&
+ git init clone-from &&
+ (
+ cd clone-from &&
+ git checkout -b base &&
+ test_commit A &&
+ git checkout -b tip &&
+ git commit --allow-empty -m "will drop by shallow" &&
+ git commit --allow-empty -m "will keep by shallow" &&
+ git commit --allow-empty -m "for bundle, not clone" &&
+ git bundle create tip.bundle tip~1..tip &&
+ git reset --hard HEAD~1 &&
+ git checkout base
+ ) &&
+ BAD_OID=$(git -C clone-from rev-parse tip~1) &&
+ TIP_OID=$(git -C clone-from rev-parse tip) &&
+ git clone --depth=1 --no-single-branch \
+ "file://$(pwd)/clone-from" clone-to &&
+ (
+ cd clone-to &&
+
+ # Set up broken history by removing shallow markers
+ git update-ref -d refs/remotes/origin/tip &&
+ rm .git/shallow &&
+
+ # Verify should fail
+ test_must_fail git bundle verify \
+ ../clone-from/tip.bundle 2>err &&
+ grep "Could not read $BAD_OID" err &&
+ grep "Failed to traverse parents of commit $TIP_OID" err &&
+
+ # Unbundling should fail
+ test_must_fail git bundle unbundle \
+ ../clone-from/tip.bundle 2>err &&
+ grep "Could not read $BAD_OID" err &&
+ grep "Failed to traverse parents of commit $TIP_OID" err
+ )
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related
* [PATCH v3 00/11] Bundle URIs V: creationToken heuristic for incremental fetches
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
To: git; +Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee
In-Reply-To: <pull.1454.v2.git.1674487310.gitgitgadget@gmail.com>
This fifth part to the bundle URIs feature follows part IV (advertising via
protocol v2) which recently merged to 'master', so this series is based on
'master'.
This part introduces the concept of a heuristic that a bundle list can
advertise. The purpose of the heuristic is to hint to the Git client that
the bundles can be downloaded and unbundled in a certain order. In
particular, that order can assist with using the same bundle URI to download
new bundles from an updated bundle list. This allows bundle URIs to assist
with incremental fetches, not just initial clones.
The only planned heuristic is the "creationToken" heuristic where the bundle
list adds a 64-bit unsigned integer "creationToken" value to each bundle in
the list. Those values provide an ordering on the bundles implying that the
bundles can be unbundled in increasing creationToken order and at each point
the required commits for the ith bundle were provided by bundles with lower
creationTokens.
At clone time, the only difference implied by the creationToken order is
that the Git client does not need to guess at the order to apply the
bundles, but instead can use the creationToken order to apply them without
failure and retry. However, this presents an interesting benefit during
fetches: the Git client can check the bundle list and download bundles in
decreasing creationToken order until the required commits for these bundles
are present within the repository's object store. This prevents downloading
more bundle information than required.
The creationToken value is also a promise that the Git client will not need
to download a bundle if its creationToken is less than or equal to the
creationToken of a previously-downloaded bundle. This further improves the
performance during a fetch in that the client does not need to download any
bundles at all if it recognizes that the maximum creationToken is the same
(or smaller than) a previously-downloaded creationToken.
The creationToken concept is documented in the existing design document at
Documentation/technical/bundle-uri.txt, including suggested ways for bundle
providers to organize their bundle lists to take advantage of the heuristic.
This series formalizes the creationToken heuristic and the Git client logic
for understanding it. Further, for bundle lists provided by the git clone
--bundle-uri option, the Git client will recognize the heuristic as being
helpful for incremental fetches and store config values so that future git
fetch commands check the bundle list before communicating with any Git
remotes.
Note that this option does not integrate fetches with bundle lists
advertised via protocol v2. I spent some time working on this, but found the
implementation to be distinct enough that it merited its own attention in a
separate series. In particular, the configuration for indicating that a
fetch should check the bundle-uri protocol v2 command seemed best to be
located within a Git remote instead of a repository-global key such as is
being used for a static URI. Further, the timing of querying the bundle-uri
command during a git fetch command is significantly different and more
complicated than how it is used in git clone.
What Remains?
=============
Originally, I had planned on making this bundle URI work a 5-part series,
and this is part 5. Shouldn't we be done now?
There are two main things that should be done after this series, in any
order:
* Teach git fetch to check a bundle list advertised by a remote over the
bundle-uri protocol v2 command.
* Add the bundle.<id>.filter option to allow advertising bundles and
partial bundles side-by-side.
There is also room for expanding tests for more error conditions, or for
other tweaks that are not currently part of the design document. I do think
that after this series, the feature will be easier to work on different
parts in parallel.
Patch Outline
=============
* (New in v3) Patch 1 tests the behavior of 'git bundle verify' and 'git
bundle unbundle' when in the strange situation where a prerequisite
commit exists in the object store but is not closed under reachability
(necessarily not reachable from refs, too). This helps motivate the new
Patch 2.
* (New in v3) Patch 2 updates the behavior in verify_bundle() to use the
check_connected()
* Patch 3 creates a test setup demonstrating a creationToken heuristic. At
this point, the Git client ignores the heuristic and uses its ad-hoc
strategy for ordering the bundles.
* Patches 4 and 5 teach Git to parse the bundle.heuristic and
bundle.<id>.creationToken keys in a bundle list.
* Patch 6 teaches Git to download bundles using the creationToken order.
This order uses a stack approach to start from the maximum creationToken
and continue downloading the next bundle in the list until all bundles
can successfully be unbundled. This is the algorithm required for
incremental fetches, while initial clones could download in the opposite
order. Since clones will download all bundles anyway, having a second
code path just for clones seemed unnecessary.
* Patch 7 teaches git clone --bundle-uri to set fetch.bundleURI when the
advertised bundle list includs a heuristic that Git understands.
* Patch 8 updates the design document to remove reference to a bundle.flag
option that was previously going to indicate the list was designed for
fetches, but the bundle.heuristic option already does that.
* Patch 9 teaches git fetch to check fetch.bundleURI and download bundles
from that static URI before connecting to remotes via the Git protocol.
* Patch 10 introduces a new fetch.bundleCreationToken config value to store
the maximum creationToken of downloaded bundles. This prevents
downloading the latest bundle on every git fetch command, reducing waste.
* Patch 11 adds new tests for interesting incremental fetch shapes. Along
with other test edits in other patches, these revealed several issues
that required improvement within this series. These tests also check
extra cases around failed bundle downloads.
Updates in v3
=============
* Patches 1 and 2 are replacements for v3's patch 1. Instead of skipping
the reachability walk, make it slightly more flexible by using
check_connected(). The first patch adds tests that cover this behavior,
which was previously untested.
* Patch 6 replaces the "stack_operation" label with a "move" label.
* Patch 9 simplifies nested ifs to use &&.
* Patch 11 updates some incorrect test comments.
Updates in v2
=============
* Patches 1 and 10 are new.
* I started making the extra tests in patch 10 due to Victoria's concern
around failed downloads. I extended the bundle list in a way that exposed
other issues that are fixed in this version. Unfortunately, the test
requires the full functionality of the entire series, so the tests are
not isolated to where the code fixes are made. One thing that I noticed
in the process is that some of the tests were using the local-clone trick
to copy full object directories instead of copying only the requested
object set. This was causing confusion in how the bundles were applying
or failing to apply, so the tests are updated to use http whenever
possible.
* In Patch 2, I created a new test_remote_https_urls helper to get the full
download list (in order). In this patch, the bundle download order is not
well-defined, but is modified in later tests when it becomes
well-defined.
* In Patch 3, I updated the connection between config value and enum value
to be an array of pairs instead of faking a hashmap-like interface that
could be dangerous if the enum values were assigned incorrectly.
* In Patch 5, the 'sorted' list and its type was renamed to be more
descriptive. This also included updates to "append_bundle()" and
"compare_creation_token_decreasing()" to be more descriptive. This had
some side effects in Patch 8 due to the renames.
* In Patch 5, I added the interesting bundle shape to the commit message to
remind us of why the creationToken algorithm needs to be the way it is. I
also removed the "stack" language in favor of discussing ranges of the
sorted list. Some renames, such as "pop_or_push" is changed to
"move_direction", resulted from this change of language.
* The assignment of heuristic from the local list to global_list was moved
into Patch 5.
* In Patch 5, one of the tests removed bundle-2 because it allows a later
test for git fetch to demonstrate the interesting behavior where bundle-4
requires both bundle-2 and bundle-3.
* In Patch 6, the fetch.bundleURI config is described differently,
including dropping the defunct git fetch --bundle-uri reference and
discussing that git clone --bundle-uri will set it automatically.
* Patch 8 no longer refers to a config value starting with "remote:". It
also expands a test that was previously not expanded in v1.
* Patch 9 updates the documentation for fetch.bundleURI and
fetch.bundleCreationToken to describe how the user should unset the
latter if they edit the former.
* Much of Patch 9's changes are due to context changes from the renames in
Patch 5. However, it also adds the restriction that it will not attempt
to download bundles unless their creationToken is strictly greater than
the stored token. This ends up being critical to the failed download
case, preventing an incremental fetch from downloading all bundles just
because one bundle failed to download (and that case is tested in patch
10).
* Patch 10 adds significant testing, including several tests of failed
bundle downloads in various cases.
Thanks,
* Stolee
Derrick Stolee (11):
bundle: test unbundling with incomplete history
bundle: verify using check_connected()
t5558: add tests for creationToken heuristic
bundle-uri: parse bundle.heuristic=creationToken
bundle-uri: parse bundle.<id>.creationToken values
bundle-uri: download in creationToken order
clone: set fetch.bundleURI if appropriate
bundle-uri: drop bundle.flag from design doc
fetch: fetch from an external bundle URI
bundle-uri: store fetch.bundleCreationToken
bundle-uri: test missing bundles with heuristic
Documentation/config/bundle.txt | 7 +
Documentation/config/fetch.txt | 24 +
Documentation/technical/bundle-uri.txt | 8 +-
builtin/clone.c | 6 +-
builtin/fetch.c | 6 +
bundle-uri.c | 249 ++++++++-
bundle-uri.h | 28 +-
bundle.c | 75 ++-
t/t5558-clone-bundle-uri.sh | 672 ++++++++++++++++++++++++-
t/t5601-clone.sh | 46 ++
t/t5750-bundle-uri-parse.sh | 37 ++
t/t6020-bundle-misc.sh | 40 ++
t/test-lib-functions.sh | 8 +
13 files changed, 1149 insertions(+), 57 deletions(-)
base-commit: 4dbebc36b0893f5094668ddea077d0e235560b16
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1454%2Fderrickstolee%2Fbundle-redo%2FcreationToken-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1454/derrickstolee/bundle-redo/creationToken-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1454
Range-diff vs v2:
1: b3828725bc8 < -: ----------- bundle: optionally skip reachability walk
-: ----------- > 1: f9b0cc872ac bundle: test unbundling with incomplete history
-: ----------- > 2: 20c29d37f9c bundle: verify using check_connected()
2: 427aff4d5e5 = 3: 45cdf9d13a7 t5558: add tests for creationToken heuristic
3: f6f8197c9cc = 4: 49bf10e0fd4 bundle-uri: parse bundle.heuristic=creationToken
4: 12efa228d04 = 5: ff629bc119b bundle-uri: parse bundle.<id>.creationToken values
5: 7cfaa3c518c ! 6: 366db5f6931 bundle-uri: download in creationToken order
@@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
+
+ /* Try looking deeper in the list. */
+ move_direction = 1;
-+ goto stack_operation;
++ goto move;
+ }
+
+ /* We expect bundles when using creationTokens. */
@@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
+ * previous step.
+ */
+
-+stack_operation:
++move:
+ /* Move in the specified direction and repeat. */
+ cur += move_direction;
+ }
6: 17c404c1b83 = 7: b59c4e2d390 clone: set fetch.bundleURI if appropriate
7: d491070efed = 8: 83f49b37c69 bundle-uri: drop bundle.flag from design doc
8: 59e57e04968 ! 9: 314c60f2ae4 fetch: fetch from an external bundle URI
@@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
if (dry_run)
write_fetch_head = 0;
-+ if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri)) {
-+ if (fetch_bundle_uri(the_repository, bundle_uri, NULL))
-+ warning(_("failed to fetch bundles from '%s'"), bundle_uri);
-+ }
++ if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri) &&
++ fetch_bundle_uri(the_repository, bundle_uri, NULL))
++ warning(_("failed to fetch bundles from '%s'"), bundle_uri);
+
if (all) {
if (argc == 1)
9: 6a1504b1c3a ! 10: 4e0465efd19 bundle-uri: store fetch.bundleCreationToken
@@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
}
}
-@@ bundle-uri.c: stack_operation:
+@@ bundle-uri.c: move:
cur += move_direction;
}
10: 676522615ad ! 11: c968b63feba bundle-uri: test missing bundles with heuristic
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
+ test_remote_https_urls <trace-clone-2.txt >actual &&
+ test_cmp expect actual &&
+
-+ # Only base bundle unbundled.
++ # bundle-1 and bundle-3 could unbundle, but bundle-4 could not
+ git -C download-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+ cat >expect <<-EOF &&
+ refs/bundles/base
@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
+ test_remote_https_urls <trace-clone-3.txt >actual &&
+ test_cmp expect actual &&
+
-+ # All bundles failed to unbundle
++ # fake.bundle did not unbundle, but the others did.
+ git -C download-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+ cat >expect <<-EOF &&
+ refs/bundles/base
--
gitgitgadget
^ permalink raw reply
* Re: What's cooking in git.git (Jan 2023, #07; Mon, 30)
From: Derrick Stolee @ 2023-01-31 12:42 UTC (permalink / raw)
To: Junio C Hamano, git
In-Reply-To: <xmqqedrb1uvy.fsf@gitster.g>
On 1/30/2023 6:10 PM, Junio C Hamano wrote:
> * ds/scalar-ignore-cron-error (2023-01-27) 3 commits
> - scalar: only warn when background maintenance fails
> - t921*: test scalar behavior starting maintenance
> - t: allow 'scalar' in test_must_fail
>
> Allow "scalar" to warn but continue when its periodic maintenance
> feature cannot be enabled.
>
> Will merge to 'next'.
> source: <pull.1473.git.1674849963.gitgitgadget@gmail.com>
I was intending to re-roll, and prepared the --no-src option,
but these three patches are fine on their own, so I'm happy
for them to merge to 'next' and I can do the --no-src on its
own.
Thanks,
-Stolee
^ permalink raw reply
* Re: Bug: Cloning git repositories behind a proxy using the git:// protocol broken since 2.32
From: Florian Bezdeka @ 2023-01-31 12:08 UTC (permalink / raw)
To: brian m. carlson
Cc: git@vger.kernel.org, gitster@pobox.com, greg.pflaum@pnp-hcl.com,
peff@peff.net
In-Reply-To: <Y9j1RxKhNq2TnL4U@tapette.crustytoothpaste.net>
On Tue, 2023-01-31 at 11:02 +0000, brian m. carlson wrote:
> On 2023-01-31 at 10:52:47, Bezdeka, Florian wrote:
> > Hi all,
>
> Hey,
>
> > I just updated from git 2.30.2 (from Debian 11) to 2.39.0 (from Debian
> > testing) and realized that I can no longer clone repositories using the
> > git:// protocol.
> >
> > There is one specialty in my setup: I'm located behind a proxy, so
> > GIT_PROXY_COMMAND is set. I'm usiung the oe-git-proxy script [1] here.
> > My environment provides the http_proxy variable and privoxy [2] is
> > running on the server side. That information should be sufficient to
> > reproduce.
> >
> > I tried the following two repositories for testing:
> > - git clone git://git.code.sf.net/p/linuxptp/code linuxptp
> > - git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
> >
> > The result is:
> > Cloning into 'linuxptp'...
> > fetch-pack: unexpected disconnect while reading sideband packet
> > fatal: early EOF
> > fatal: fetch-pack: invalid index-pack output
> >
> > I was able to "git bisect" it to the following commit:
> > ae1a7eefffe6 ("fetch-pack: signal v2 server that we are done making requests")
> >
> > Reverting this commit on top of the master branch fixes my issue.
> > All people involved in this commit should be in CC.
> >
> > Looking at the TCP byte stream shows that the socket is closed after
> > the client received the first "part" of the packfile.
> >
> > ...
> > 0032want ec3f28a0ac13df805278164f2c72e69676d13134
> > 0032want 57caf5d94876e8329be65d2dc29d3c528b149724
> > 0009done
> > 0000000dpackfile
> >
> > Let me know if you need further information. Hopefully this was the
> > correct way of submitting a bug to git...
>
> I think this may have come up before, and I think the rule is that you
> need a proxy where closing standard input doesn't close standard output.
> Since that script is using socat, I believe you need the -t option to
> make this work, or some other approach where standard input and standard
> output can be closed independently.
Thanks for the super fast response, highly appreciated!
I was able to get it running by switching to ncat using the --no-
shutdown option, but I failed to bring back socat support so far.
For me this is still a regression. We have to change our
infrastructure/environment because we have a new requirement
(independent handling of stdin/out) after updating git now. I would
expect some noise from the yocto/OE community in the future where oe-
git-proxy is heavily used.
I guess proxy support was forgotten when the referenced change was
made. Any chance we can avoid closing stdout when running "in proxy
mode" to restore backward compatibility?
Thanks a lot!
Florian
^ permalink raw reply
* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: Ævar Arnfjörð Bjarmason @ 2023-01-31 11:31 UTC (permalink / raw)
To: brian m. carlson; +Cc: Eli Schwartz, Git List
In-Reply-To: <Y9jlWYLzZ/yy4NqD@tapette.crustytoothpaste.net>
On Tue, Jan 31 2023, brian m. carlson wrote:
> Part of the reason I think this is valuable is that once SHA-1 and
> SHA-256 interoperability is present, git archive will change the
> contents of the archive format, since it will embed a SHA-256 hash into
> the file instead of a SHA-1 hash, since that's what's in the repository.
> Thus, we can't produce an archive that's deterministic in the face of
> SHA-1/SHA-256 interoperability concerns, and we need to create a new
> format that doesn't contain that data embedded in it.
I don't see why a format change would be required in this context.
If a repository were to switch over to SHA-256 wouldn't a better
solution to this be to disambiguate whether you're requesting a SHA-1 or
SHA-256 derived archive in the URL? E.g. to never serve up an archive
with a SHA-256 embedded in the header at:
https://github.com/git/git/archive/refs/tags/v2.39.1.tar.gz
But require a URL like:
https://github.com/git/git/archive-sha256/refs/tags/v2.39.1.tar.gz
If you did that then existing archives would continue to have the same
byte-for-byte content (assuming that the result of this discussion is
that we support that forever), but they'd always be generated with "-c
extensions.objectFormat=sha1". For always-SHA256 repos such a URL would
fail to generate anything.
But for repos that used to be SHA-1 but are now SHA-256 either URL would
work, but the PAX header would be different, referring to the SHA-1 or
SHA-256 commit, respectively.
Whereas your proposal seems to be that we should omit that SHA-(1|256)
from the "comment" entirely. That would seem to require either a one-off
change of all existing archives, or some cut-off date (or other marker).
If you've got a cut-off, you could also just use it to decide whether to
generate a SHA-1 or SHA-256 archive, and without that you'd be back to
the one-off breakage.
I also find it very useful that we've got the commit OID in the archive,
as it allows for round-tripping from archives back to the relevant
repository commit. Losing that entirely for SHA-1<->SHA-256 interop
would be unfortunate, especially if it turns out we could have easily
kept it
> Having said that, I don't think this should be based on the timestamp of
> the file, since that means that two otherwise identical archives
> differing in timestamp aren't ever going to be the same, and we do see
> people who import or vendor other projects.
Yes, I agree that doing this by that sort of heuristic would be bad.
> Nor do I think we should
> attempt to provide consistent compression, since I believe the output of
> things like zlib has changed in the past, and we can't continually carry
> an old, potentially insecure version of zlib just because the output
> changed. People should be able to implement compression using gzip,
> zlib, pigz, miniz_oxide, or whatever if they want, since people
> implement Git in many different languages, and we won't want to force
> people using memory-safe languages like Go and Rust to explicitly use
> zlib for archives.
As I noted in the side-thread I think an acceptable solution would be to
push the problem of the consistent compressor downstream. I.e. if a site
like GitHub wants to maintain a potentially old version of GNU gzip that
should be up to them.
But I think it's a valid concern that we should guarantee the stability
of the archive format.
^ permalink raw reply
* Re: Bug: Cloning git repositories behind a proxy using the git:// protocol broken since 2.32
From: brian m. carlson @ 2023-01-31 11:02 UTC (permalink / raw)
To: Bezdeka, Florian
Cc: git@vger.kernel.org, gitster@pobox.com, greg.pflaum@pnp-hcl.com,
peff@peff.net
In-Reply-To: <4831bbeb0ec29ec84f92e0badfc0d628ecc6921d.camel@siemens.com>
[-- Attachment #1: Type: text/plain, Size: 2005 bytes --]
On 2023-01-31 at 10:52:47, Bezdeka, Florian wrote:
> Hi all,
Hey,
> I just updated from git 2.30.2 (from Debian 11) to 2.39.0 (from Debian
> testing) and realized that I can no longer clone repositories using the
> git:// protocol.
>
> There is one specialty in my setup: I'm located behind a proxy, so
> GIT_PROXY_COMMAND is set. I'm usiung the oe-git-proxy script [1] here.
> My environment provides the http_proxy variable and privoxy [2] is
> running on the server side. That information should be sufficient to
> reproduce.
>
> I tried the following two repositories for testing:
> - git clone git://git.code.sf.net/p/linuxptp/code linuxptp
> - git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
>
> The result is:
> Cloning into 'linuxptp'...
> fetch-pack: unexpected disconnect while reading sideband packet
> fatal: early EOF
> fatal: fetch-pack: invalid index-pack output
>
> I was able to "git bisect" it to the following commit:
> ae1a7eefffe6 ("fetch-pack: signal v2 server that we are done making requests")
>
> Reverting this commit on top of the master branch fixes my issue.
> All people involved in this commit should be in CC.
>
> Looking at the TCP byte stream shows that the socket is closed after
> the client received the first "part" of the packfile.
>
> ...
> 0032want ec3f28a0ac13df805278164f2c72e69676d13134
> 0032want 57caf5d94876e8329be65d2dc29d3c528b149724
> 0009done
> 0000000dpackfile
>
> Let me know if you need further information. Hopefully this was the
> correct way of submitting a bug to git...
I think this may have come up before, and I think the rule is that you
need a proxy where closing standard input doesn't close standard output.
Since that script is using socat, I believe you need the -t option to
make this work, or some other approach where standard input and standard
output can be closed independently.
--
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]
^ permalink raw reply
* Bug: Cloning git repositories behind a proxy using the git:// protocol broken since 2.32
From: Bezdeka, Florian @ 2023-01-31 10:52 UTC (permalink / raw)
To: git@vger.kernel.org
Cc: gitster@pobox.com, greg.pflaum@pnp-hcl.com, peff@peff.net
Hi all,
I just updated from git 2.30.2 (from Debian 11) to 2.39.0 (from Debian
testing) and realized that I can no longer clone repositories using the
git:// protocol.
There is one specialty in my setup: I'm located behind a proxy, so
GIT_PROXY_COMMAND is set. I'm usiung the oe-git-proxy script [1] here.
My environment provides the http_proxy variable and privoxy [2] is
running on the server side. That information should be sufficient to
reproduce.
I tried the following two repositories for testing:
- git clone git://git.code.sf.net/p/linuxptp/code linuxptp
- git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
The result is:
Cloning into 'linuxptp'...
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output
I was able to "git bisect" it to the following commit:
ae1a7eefffe6 ("fetch-pack: signal v2 server that we are done making requests")
Reverting this commit on top of the master branch fixes my issue.
All people involved in this commit should be in CC.
Looking at the TCP byte stream shows that the socket is closed after
the client received the first "part" of the packfile.
...
0032want ec3f28a0ac13df805278164f2c72e69676d13134
0032want 57caf5d94876e8329be65d2dc29d3c528b149724
0009done
0000000dpackfile
Let me know if you need further information. Hopefully this was the
correct way of submitting a bug to git...
Best regards,
Florian
[1] https://wiki.yoctoproject.org/wiki/Working_Behind_a_Network_Proxy
[2] https://www.privoxy.org/
^ permalink raw reply
* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: brian m. carlson @ 2023-01-31 9:54 UTC (permalink / raw)
To: Eli Schwartz; +Cc: Git List
In-Reply-To: <a812a664-67ea-c0ba-599f-cb79e2d96694@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4823 bytes --]
On 2023-01-31 at 00:06:44, Eli Schwartz wrote:
> Nevertheless, I've seen the sentiment a few times that git doesn't like
> committing to output stability of git-archive, because it isn't
> officially documented (but it's not entirely clear what the benefits of
> changing are). And yet, git endeavors to do so, in order to prevent
> unnecessary breakage of people who embody Hyrum's Law and need that
> stability.
I'm one of the GitHub employees who chimed in there, and I'm also a Git
contributor in my own time (and I am speaking here only in my personal
capacity, since this is a personal address). I made a change some years
back to the archive format to fix the permissions on pax headers when
extracted as files, and kernel.org was relying on that and broke. Linus
yelled at me because of that.
Since then, I've been very opposed to us guaranteeing output format
consistency without explicitly doing so. I had sent some patches before
that I don't think ever got picked up that documented this explicitly.
I very much don't want people to come to rely on our behaviour unless we
explicitly guarantee it.
> What does everyone think about offering versioned git-archive outputs?
> This could be user-selectable as an option to `git archive`, but the
> main goal would be to select a good versioned output format depending on
> what is being archived. So:
>
> - first things first, un-default the internal compressor again
> - implement a v2 archive format, where the internal compressor is the
> default -- no other changes
> - teach git to select an archive format based on the date of the object
> being archived
> - when given a commit/tag ID to archive, check which support frame the
> committer date falls inside
> - for tree IDs, always use the latest format (it always uses the
> current date anyway)
> - schedule a date, for the sake of argument, 6 months after the next
> scheduled release date of git version X.Y in which this change goes
> live; bake this into the git sources as a transition date, all commits
> or tags generated after this date fall into the next format support
> frame
I am actually very much in favour of providing a standard, deterministic
version of pax (the extended tar format) that we use and documenting it
as a standard so that other archive tools can use that. That is, we
document some canonical tar format that is bit-for-bit identical that we
(and hopefully GNU tar and libarchive) will agree should be used to
serialize files for software interchange. I don't think this should be
dependent on the date at all, but I do believe it should be versioned
and tested, and the version number embedded as a pax header. I think
this would be valuable for simply having reproducible archives in
general, including for things like Docker containers, Debian packages,
Rust crates, and more, and I'm happy to work with others on such a
format, as I've said in the past on the list. People can opt-in to
whatever format they want when creating an archive and continue to use
that forever if they like.
Part of the reason I think this is valuable is that once SHA-1 and
SHA-256 interoperability is present, git archive will change the
contents of the archive format, since it will embed a SHA-256 hash into
the file instead of a SHA-1 hash, since that's what's in the repository.
Thus, we can't produce an archive that's deterministic in the face of
SHA-1/SHA-256 interoperability concerns, and we need to create a new
format that doesn't contain that data embedded in it.
Having said that, I don't think this should be based on the timestamp of
the file, since that means that two otherwise identical archives
differing in timestamp aren't ever going to be the same, and we do see
people who import or vendor other projects. Nor do I think we should
attempt to provide consistent compression, since I believe the output of
things like zlib has changed in the past, and we can't continually carry
an old, potentially insecure version of zlib just because the output
changed. People should be able to implement compression using gzip,
zlib, pigz, miniz_oxide, or whatever if they want, since people
implement Git in many different languages, and we won't want to force
people using memory-safe languages like Go and Rust to explicitly use
zlib for archives.
That may mean that it's important for people to actually decompress the
archive before checking hashes if they want deterministic behaviour, and
I'm okay with that. You already have to do that if you're verifying the
signature on Git tarballs, since only the uncompressed tar archive is
signed, so I don't think this is out of the question.
--
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]
^ permalink raw reply
* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: Eli Schwartz @ 2023-01-31 9:11 UTC (permalink / raw)
To: Ævar Arnfjörð Bjarmason
Cc: Git List, brian m. carlson, René Scharfe,
Johannes Schindelin, Jeff King
In-Reply-To: <230131.86357rrtsg.gmgdl@evledraar.gmail.com>
Quick response for now...
On 1/31/23 2:49 AM, Ævar Arnfjörð Bjarmason wrote:
> So first, aside from whatever the git project does about the default,
> have you tried running the newer git version with a
> tar.tgz.command='gzip -cn' and seeing if it's compatible with the old
> version?
>
> It's unclear from the blog post's "we are reverting this change for now"
> whether that meant a revert of the git version (probably), or a revert
> back to using gzip(1).
I do not know which one Github internally did, but I can confirm that
the gzipped tarballs which github started shipping, when gunzipped,
produced an uncompressed tarball that was byte-identical to uncompressed
editions of the historic ones.
i.e. you could do this:
```
wget ${important_archive_release}
gzip -dc < ${important_archive_localfile} | gzip -cn >
${important_archive_localfile}.new
```
And:
- they have different checksums
- the .new file has reverted to the same checksum as historic versions
from last year that are frozen into manifests
That was part of my original investigation, before I located the public
conversations.
--
Eli Schwartz
^ permalink raw reply
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails
From: Ævar Arnfjörð Bjarmason @ 2023-01-31 8:34 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Mathias Krause, git, Carlo Marcelo Arenas Belón
In-Reply-To: <xmqq8rhj504i.fsf@gitster.g>
On Mon, Jan 30 2023, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> If I compile libpcre2 with JIT support I'm expecting Git to use that,
>> and not fall back in those cases where the JIT engine would give up.
>
> The thing is, the reason why their Git has JIT enabled pcre2 for
> many users is not because they choose to compile their own Git for
> themselves because they wanted to play with JIT. To them, their
> distro and/or their employer gave a precompiled Git, in the hope
> that with JIT would be faster than without JIT when JIT is usable.
>
> In that context, "Speed is a feature in itself" is correct but
> "failing fast, forcing the user to try different things" is not a
> "Speed" feature at all. It may be interesting only for those who
> are curious to see what pattern was rejected by JIT. It is
> especially true as (1) we are willing to fall back to interpreter in
> the SELinux senario, and (2) for normal users who want to use Git,
> and not necessarily interested in playing with JIT, there is no
> other recourse than prefixing "I do not want this JITted" to their
> pattern ANYWAY. Why fail fast and force the user to take the only
> recourse manually, when the machinery already knows what the user's
> only viable alternative is (i.e. falling back to the interpreter)?
Because we have an issue with (1), but not (2). How would (2) happen? So
far I've only seen intentionally pathological patterns designed to
trigger the JIT's limits. I don't think it's worth DWYM-ing that path,
when we're having to assume a lot about the "M" part of that.
>> Pathological regexes are pretty much only interesting to anyone in the
>> context of DoS attacks where they're being used to cause intentional
>> slowdowns.
>
> Exactly.
>
>> Here we're discussing an orthagonal case where the "JIT fails", but
>> rather than some pathological pattern it's because SELinux has made it
>> not work at runtime, and we're trying to tease the two cases apart.
>
> s/and we're/but you're/. And I do not think you want to.
That s/// is fair, but brings me back to my question above of why we're
trying to solve (2) here.
>> I don't think this is plausible at all per the above, and that we
>> shouldn't harm realistic use-cases to satisfy hypothetical ones.
>
> To me, what you are advocating is exactly the hypothetical ones that
> harm end-users who did not choose to enable JIT themselves. When JIT
> fails for whatever reason (including the SELinux senario) for them,
> they do not need to be told by Git failing, when the interpreter can
> give them the correct answer. Wanting to see the result of the
> operation they asked Git to do, while allowing Git to use clever
> optimizations WHEN ABLE, is what I see as realistic use-cases.
I'm saying that the "JIT fails for whatever reason" is
hypothetical. It'll fail because of:
- The (1) case, where we're categorically unable to run the JIT. Then
we should proceed as if the JIT isn't available (as we do when it's
e.g. not compiled into PCRE).
- The pattern is pathological enough that it's about to take eons to
execute it (2).
The lack of bug reports about "hey, my existing 'git grep' pattern
failed" when the JIT was shipped with v2.14.0 shows that this doesn't
happen in practice.
- The case where the API is returning some new error code that's
unknown to us, let's call that (3).
^ permalink raw reply
* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: Ævar Arnfjörð Bjarmason @ 2023-01-31 7:49 UTC (permalink / raw)
To: Eli Schwartz
Cc: Git List, brian m. carlson, René Scharfe,
Johannes Schindelin, Jeff King
In-Reply-To: <a812a664-67ea-c0ba-599f-cb79e2d96694@gmail.com>
On Mon, Jan 30 2023, Eli Schwartz wrote:
> For those that haven't seen, github changed its checksums for all
> "source code" artifacts attached to any git repository with tags. This
> change is now reverted due to widespread breakage -- and the lack of
> advance warning. The technical details of the change appear simple: they
> upgraded git.
>
> Probably the main discussion, complete with Github employees from this
> mailing list responding:
>
> https://github.com/bazel-contrib/SIG-rules-authors/issues/11#issuecomment-1409438954
>
> Consequences of that discussion, attempting to mitigate issues by
> warning people that it already happened:
>
> https://github.blog/changelog/2023-01-30-git-archive-checksums-may-change/
>
> And where I first saw it: https://github.com/mesonbuild/wrapdb/pull/884
Maybe I'm the only one that missed this on a first reading, but I
couldn't find what specific change in Git was being discussed.
But it's linked from the now-strikethrough portion of that github.blog
URL: 4f4be00d302 (archive-tar: use internal gzip by default,
2022-06-15), first released with v2.38.0.
That's the change to use gzip as a library instead of gzip(1), I've
added the author to the CC list, as well as well as others in the
initial ML dicsussion.
The ML discussion about that series starts at:
https://lore.kernel.org/git/pull.145.git.gitgitgadget@gmail.com/
For that change specifically I had this comment at the time:
https://lore.kernel.org/git/220615.86wndhwt9a.gmgdl@evledraar.gmail.com/
The response from René
(https://lore.kernel.org/git/3ed80afd-34b3-afd8-5ffb-0187a4475ee1@web.de/)
fills in the "why" missing from the commit message itself:
"It's to avoid a run dependency [on gzip(1)] [...] and you can
set tar.tgz.command='gzip -cn' to get the old behavior. Saving
energy is a better default, though.
We can discuss how worthwhile that trade-off is, especially in the face
of this behavior change GitHub encounterd, but I don't think it was the
intent with this change to change the output (but maybe René was aware
of that, but didn't note it).
Which brings me to...
> Historically speaking, git-archive has been stable minus... a bug fix or
> two in rare cases, specifically relating to an inability to transcribe
> the contents of the git repo at all, I think? And the other factor is
> the compression algorithm used, which is generally GNU gzip, and
> historically whatever the system `gzip` command is.
>
> And gzip is a stable format. It's a worn-out, battle-weary format, even
> -- it's not the best at compressing, and it's not the best at
> decompressing, and "all the cool kids" are working on cooler formats,
> such as zstd which does indeed regularly change its byte output between
> versions. But the advantage of gzip is that it's good *enough*, and it's
> probably *everywhere*, and it's *reliable*.
>
> GNU gzip is reproducible. busybox gzip was fixed to agree with GNU gzip
> (this is relevant to the handful of people running software forges on,
> say, Alpine Linux):
>
> https://reproducible-builds.org/reports/2019-08/#upstream-news
>
> ...
>
> Nevertheless, I've seen the sentiment a few times that git doesn't like
> committing to output stability of git-archive, because it isn't
> officially documented (but it's not entirely clear what the benefits of
> changing are). And yet, git endeavors to do so, in order to prevent
> unnecessary breakage of people who embody Hyrum's Law and need that
> stability.
...Yes, this has been discussed many times on-list.
My recollection of those discussions in general is that we were mostly
talking about the "tar" format itself, moreso than "gzip", although in
this case it's a change in the gzip component that changed the output.
It's not clear to me (and I'm asking instead of digging myself, as I
assume someone at GitHub has dug already) whether our change to the
"internal gzip" is necessarily going to result in a different hash, or
did we just forget to provide some option to the library to get the same
result as gzip(1).
A major thing you're eliding here is that even if "tar" or "gzip" is a
"a worn-out, battle-weary format" that does *not* translate to it being
a trivial matter to maintain byte-for-byte compatibility in the archives
(or compression stream) you produce, even though the resulting output
once un-archived or un-compressed is guaranteed to be the same.
We ship our own "tar" for the purposes of this discussion (the archive.c
code etc.), but offload the "gzip" part to either an external library
(which is new in v2.38.0, and the subject of this discussion), or to
GNU's gzip command.
I have no idea if the "gzip" part of this would be as easy as saying
"we'll default to gzip(1)", you note "GNU gzip is reproducible. busybox
gzip was fixed to agree with GNU gzip", but does the same apply to other
"gzip(1)"? I know of at least the BSD gzip.
Even then, has even GNU gzip promised that it will forever maintain
byte-for-byte compatibility in its output?
> Even with the new change to the compressor, git-archive is still
> reproducible, it's the internal gzip compressor that isn't. (This may be
> fixable, possibly by embedding an implementation from busybox or from
> GNU gzip? I'm not going to discuss that right now, though I think it's
> an interesting avenue of exploration.)
So first, aside from whatever the git project does about the default,
have you tried running the newer git version with a
tar.tgz.command='gzip -cn' and seeing if it's compatible with the old
version?
It's unclear from the blog post's "we are reverting this change for now"
whether that meant a revert of the git version (probably), or a revert
back to using gzip(1).
> I've thought about this now and then over the last couple of years,
> because I think I have a reasonable compromise that might make everyone
> (or at least most people) happy, and now seems like a good idea to
> mention it.
>
> What does everyone think about offering versioned git-archive outputs?
> This could be user-selectable as an option to `git archive`, but the
> main goal would be to select a good versioned output format depending on
> what is being archived. So:
>
> - first things first, un-default the internal compressor again
> - implement a v2 archive format, where the internal compressor is the
> default -- no other changes
> - teach git to select an archive format based on the date of the object
> being archived
> - when given a commit/tag ID to archive, check which support frame the
> committer date falls inside
> - for tree IDs, always use the latest format (it always uses the
> current date anyway)
> - schedule a date, for the sake of argument, 6 months after the next
> scheduled release date of git version X.Y in which this change goes
> live; bake this into the git sources as a transition date, all commits
> or tags generated after this date fall into the next format support
> frame
>
> The end result is that for all historic commits or tags, `git archive`
> will always produce the same output. This can be documented in the
> git-archive manpage: "the produced archive is guaranteed to be
> reproducible, unless you override the `tar.<format>.command` or your
> system compressor is not reproducible".
>
> For *new* commits or tags, everyone gets the benefit of fascinating,
> cool new archive formats with useful improvements at the tar container
> level, which is apparently a very desirable feature. The git project no
> longer has to worry, at all, about whether users will come to complain
> about how their build pipelines suddenly fail with checksum issues. The
> git project can simply, fearlessly, go implement innovative new changes
> without giving any thought to backwards compatibility.
>
> It is, simply, that those new changes only apply to projects which are
> still under active development, and which push new commits or tag new
> releases after the transition date.
>
> Old states of existing projects (regardless of whether they are still
> actively updating) can go have their old and apparently inefficient
> archives and don't get cool new stuff. That's fine. They're also
> increasingly rarely used, because they are, after all, old -- and most
> likely only used for historic archival purposes. If the worst comes to
> worst, well, they managed to produce a somehow useful archive with an
> older version of git -- nothing will *break* if they don't get the cool
> new stuff.
>
> And for the vast majority of new downloads for new stuff, the in-process
> compressor saves one fork+exec and is a bit more efficient, I guess?
>
> A note on the transition date: I suggested 6 months after the scheduled
> release date, because this gives everyone running a software forge time
> to update git itself, and have everything ready, in time to handle the
> first wave of commits and tags that naturally occur after the transition
> date. And you don't want it to be immediate, because then people will
> take days or weeks to deploy and the most recent archives will change
>
> For the purposes of this thought experiment, we assume that people don't
> routinely set the system time to a year in the future. This will only be
> done in situations such as, say, testing a git upgrade deployment for a
> software forge.
This sounds like a workable transition plan, but it assumes that we had
a really good reason to change to the "internal gzip" by default, and
that we must move forward with that change in some way.
I don't think that's the case per the linked-to on-list discussion, the
aim was just to provide output if gzip(1) wasn't available, so all we'd
need is the pseudocode of:
- Prepare our tar stream
- Try to strem it to gzip(1)
- If that fails with "command does not exist" fall back to the
internal one (possibly with a warning about possibly-different
output)
Then systems without a gzip(1) could produce output (which René was
aiming for), but those with a system gzip(1) (e.g. GitHub's production
installation) could just continue to use it.
That's still a band-aid on the larger questions I raised above,
i.e. whether we'd want to forever guarantee the output of "git archive"
itself, and of the "tar.tgz.command".
My off-the-cuff response to that is that we should probably:
- Guarantee the "git archive" output itself (without compression),
leaving the out that it *may* change in the future with notice (or
we'd just version it)
- Switch back to using gzip(1) by default, whatever gzip(1) that
happens to be.
But:
- Promise that the total end result will be byte-for-byte the same, as
that would imply a promise about the external gzip(1).
- Just prominently note in our docs that if you want the
archive->compression to be byte-for-byte with the past it's up to you
to ensure that your compressor gives you that guarantee.
^ permalink raw reply
* Git loses untracked files during “stash” if there are conficts
From: Alessandro Arici @ 2023-01-31 8:08 UTC (permalink / raw)
To: git
Using git stash -u, but, as title, when there are conflicts untracked
files are gone.
This is a link, for example:
https://www.databasesandlife.com/git-stash-loses-untracked-files/
And this is a reproduction of the bug
# Create a Git repo with a single file committed
git init
echo contents > original-file.txt
git add original-file.txt
git commit -m "Creating the file"
# Create a new file, modify an old one, stash
echo foo > new-file.txt
echo contents2 > original-file.txt
git stash push -u
# Modify the old file in a different way, commit
echo contents3 > original-file.txt
git commit -am "Altering the file"
# Apply the stash, see conflict, but what about the new file?
git stash pop
cat new-file.txt
Git version, on Linux: 2.34.1
Thank you
^ permalink raw reply
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails
From: Mathias Krause @ 2023-01-31 7:48 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Ævar Arnfjörð Bjarmason,
Carlo Marcelo Arenas Belón
In-Reply-To: <xmqqk0131zxi.fsf@gitster.g>
On 30.01.23 22:21, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Having said all that, I do not mind queuing v2 if the "use *NO_JIT
>> to disable" is added to the message to help users who are forced to
>> redo the query.
>
> In the meantime, here is what I plan to apply on top of v2 while
> queuing it. The message given to die() should lack the terminating
> LF, and the overlong line can and should be split at operator
> boundary.
>
> Thanks.
>
> grep.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git c/grep.c w/grep.c
> index 59afc3f07f..42f184bd09 100644
> --- c/grep.c
> +++ w/grep.c
> @@ -357,7 +357,11 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
> p->pcre2_jit_on = 0;
> return;
> } else if (jitret) {
> - die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret);
> + die("Couldn't JIT the PCRE2 pattern '%s', got '%d'%s",
> + p->pattern, jitret,
> + pcre2_jit_functional()
> + ? "\nPerhaps prefix (*NO_GIT) to your pattern?"
> + : "");
> }
>
> /*
Looks sensible, but maybe something like below would be even better?
diff --git a/grep.c b/grep.c
index 59afc3f07fc9..e0144ba77e7a 100644
--- a/grep.c
+++ b/grep.c
@@ -357,7 +357,13 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
p->pcre2_jit_on = 0;
return;
} else if (jitret) {
- die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret);
+ int do_clip = p->patternlen > 64;
+ int clip_len = do_clip ? 64 : p->patternlen;
+ die("Couldn't JIT the PCRE2 pattern '%.*s'%s, got '%d'%s",
+ clip_len, p->pattern, do_clip ? "..." : "", jitret,
+ pcre2_jit_functional()
+ ? "\nPerhaps prefix (*NO_JIT) to your pattern?"
+ : "");
}
/*
It'll ensure, git will be printing the hint even for very long patterns,
like the one I was testing this with ("$(perl -e 'print "(.)" x 4000')").
Thanks,
Mathias
^ permalink raw reply related
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails
From: Mathias Krause @ 2023-01-31 7:30 UTC (permalink / raw)
To: Junio C Hamano
Cc: git, Ævar Arnfjörð Bjarmason,
Carlo Marcelo Arenas Belón
In-Reply-To: <xmqqlelj3hvk.fsf@gitster.g>
On 30.01.23 21:08, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> If we were to keep that "die", it is absolutely required, I would
>> think. Users who got their Git with JIT-enabled pcre2 may be
>> viewing JIT merely as "a clever optimization the implementation is
>> allowed to use when able", without knowing and more importantly
>> without wanting to know how to disable it from within their
>> patterns.
>>
>> But can't we drop that die() if we took the v1 route?
>
> Having said all that, I do not mind queuing v2 if the "use *NO_JIT
> to disable" is added to the message to help users who are forced to
> redo the query.
>
> And in practice, it shouldn't make that much difference, because the
> only scenario (other than the SELinux-like situation where JIT is
> compiled in but does not work at all) that the difference may matter
> would happen when a non-trivial portion of the patterns users use
> are not workable with JIT, but if that were the case, we would have
> written JIT off as not mature enough and not yet usable long time
> ago. So, in practice, patterns refused by JIT would be a very tiny
> minority to matter in real life, and "failing fast to inconvenience
> users" would not be too bad.
Exactly!
> So while I still think v1's simplicity is the right thing to have
> here, I think it is waste of our braincell to compare v1 vs v2. As
> v2 gives smaller incremental behaviour change perceived by end
> users, if somebody really wanted to, I'd expect that a low-hanging
> fruit #leftoverbit on top of such a patch, after the dust settles,
> would be to
>
> (1) rename pcre2_jit_functional() to fall_back_to_interpreter() or
> something,
>
> (2) add a configuration variable to tell fall_back_to_interpreter()
> that any form of JIT error is allowed to fall back to
> interpreter().
>
> and such a patch will essentially give back the simplicity of v1 to
> folks who opt into the configuration.
Fair enough. But aside from the W|X memory allocation denial exception
is the likelihood to run into the limitations of PCRE2's JIT requiring
the interpreter fallback so little (as otherwise we'd see it in the past
already), I think, the demand for such a knob is basically nonexistent.
Thanks,
Mathias
^ permalink raw reply
* Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: Eli Schwartz @ 2023-01-31 0:06 UTC (permalink / raw)
To: Git List; +Cc: brian m. carlson
For those that haven't seen, github changed its checksums for all
"source code" artifacts attached to any git repository with tags. This
change is now reverted due to widespread breakage -- and the lack of
advance warning. The technical details of the change appear simple: they
upgraded git.
Probably the main discussion, complete with Github employees from this
mailing list responding:
https://github.com/bazel-contrib/SIG-rules-authors/issues/11#issuecomment-1409438954
Consequences of that discussion, attempting to mitigate issues by
warning people that it already happened:
https://github.blog/changelog/2023-01-30-git-archive-checksums-may-change/
And where I first saw it: https://github.com/mesonbuild/wrapdb/pull/884
Historically speaking, git-archive has been stable minus... a bug fix or
two in rare cases, specifically relating to an inability to transcribe
the contents of the git repo at all, I think? And the other factor is
the compression algorithm used, which is generally GNU gzip, and
historically whatever the system `gzip` command is.
And gzip is a stable format. It's a worn-out, battle-weary format, even
-- it's not the best at compressing, and it's not the best at
decompressing, and "all the cool kids" are working on cooler formats,
such as zstd which does indeed regularly change its byte output between
versions. But the advantage of gzip is that it's good *enough*, and it's
probably *everywhere*, and it's *reliable*.
GNU gzip is reproducible. busybox gzip was fixed to agree with GNU gzip
(this is relevant to the handful of people running software forges on,
say, Alpine Linux):
https://reproducible-builds.org/reports/2019-08/#upstream-news
...
Nevertheless, I've seen the sentiment a few times that git doesn't like
committing to output stability of git-archive, because it isn't
officially documented (but it's not entirely clear what the benefits of
changing are). And yet, git endeavors to do so, in order to prevent
unnecessary breakage of people who embody Hyrum's Law and need that
stability.
Even with the new change to the compressor, git-archive is still
reproducible, it's the internal gzip compressor that isn't. (This may be
fixable, possibly by embedding an implementation from busybox or from
GNU gzip? I'm not going to discuss that right now, though I think it's
an interesting avenue of exploration.)
I've thought about this now and then over the last couple of years,
because I think I have a reasonable compromise that might make everyone
(or at least most people) happy, and now seems like a good idea to
mention it.
What does everyone think about offering versioned git-archive outputs?
This could be user-selectable as an option to `git archive`, but the
main goal would be to select a good versioned output format depending on
what is being archived. So:
- first things first, un-default the internal compressor again
- implement a v2 archive format, where the internal compressor is the
default -- no other changes
- teach git to select an archive format based on the date of the object
being archived
- when given a commit/tag ID to archive, check which support frame the
committer date falls inside
- for tree IDs, always use the latest format (it always uses the
current date anyway)
- schedule a date, for the sake of argument, 6 months after the next
scheduled release date of git version X.Y in which this change goes
live; bake this into the git sources as a transition date, all commits
or tags generated after this date fall into the next format support
frame
The end result is that for all historic commits or tags, `git archive`
will always produce the same output. This can be documented in the
git-archive manpage: "the produced archive is guaranteed to be
reproducible, unless you override the `tar.<format>.command` or your
system compressor is not reproducible".
For *new* commits or tags, everyone gets the benefit of fascinating,
cool new archive formats with useful improvements at the tar container
level, which is apparently a very desirable feature. The git project no
longer has to worry, at all, about whether users will come to complain
about how their build pipelines suddenly fail with checksum issues. The
git project can simply, fearlessly, go implement innovative new changes
without giving any thought to backwards compatibility.
It is, simply, that those new changes only apply to projects which are
still under active development, and which push new commits or tag new
releases after the transition date.
Old states of existing projects (regardless of whether they are still
actively updating) can go have their old and apparently inefficient
archives and don't get cool new stuff. That's fine. They're also
increasingly rarely used, because they are, after all, old -- and most
likely only used for historic archival purposes. If the worst comes to
worst, well, they managed to produce a somehow useful archive with an
older version of git -- nothing will *break* if they don't get the cool
new stuff.
And for the vast majority of new downloads for new stuff, the in-process
compressor saves one fork+exec and is a bit more efficient, I guess?
A note on the transition date: I suggested 6 months after the scheduled
release date, because this gives everyone running a software forge time
to update git itself, and have everything ready, in time to handle the
first wave of commits and tags that naturally occur after the transition
date. And you don't want it to be immediate, because then people will
take days or weeks to deploy and the most recent archives will change
For the purposes of this thought experiment, we assume that people don't
routinely set the system time to a year in the future. This will only be
done in situations such as, say, testing a git upgrade deployment for a
software forge.
...
"And then no one ever complained about archive checksums changing again."
🤞🙏🥺
--
Eli Schwartz
^ permalink raw reply
* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails
From: Junio C Hamano @ 2023-01-30 23:27 UTC (permalink / raw)
To: Ramsay Jones
Cc: Mathias Krause, git, Ævar Arnfjörð Bjarmason,
Carlo Marcelo Arenas Belón
In-Reply-To: <55c75313-79d4-1c5b-951b-5d1e75553441@ramsayjones.plus.com>
Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
>> + die("Couldn't JIT the PCRE2 pattern '%s', got '%d'%s",
>> + p->pattern, jitret,
>> + pcre2_jit_functional()
>> + ? "\nPerhaps prefix (*NO_GIT) to your pattern?"
>
> s/NO_GIT/NO_JIT/ ? :)
Indeed. Thanks.
^ permalink raw reply
* Re: What's cooking in git.git (Jan 2023, #07; Mon, 30)
From: Junio C Hamano @ 2023-01-30 23:27 UTC (permalink / raw)
To: git
In-Reply-To: <xmqqedrb1uvy.fsf@gitster.g>
Junio C Hamano <gitster@pobox.com> writes:
> Subject: Re: What's cooking in git.git (Jan 2023, #07; Mon, 30)
Sorry, but this is the 8th issue of the month, not the 7th.
^ permalink raw reply
* What's cooking in git.git (Jan 2023, #07; Mon, 30)
From: Junio C Hamano @ 2023-01-30 23:10 UTC (permalink / raw)
To: git
Here are the topics that have been cooking in my tree. Commits
prefixed with '+' are in 'next' (being in 'next' is a sign that a
topic is stable enough to be used and are candidate to be in a future
release). Commits prefixed with '-' are only in 'seen', and aren't
considered "accepted" at all. A topic without enough support may be
discarded after a long period of no activity.
Copies of the source code to Git live in many repositories, and the
following is a list of the ones I push into or their mirrors. Some
repositories have only a subset of branches.
With maint, master, next, seen, todo:
git://git.kernel.org/pub/scm/git/git.git/
git://repo.or.cz/alt-git.git/
https://kernel.googlesource.com/pub/scm/git/git/
https://github.com/git/git/
https://gitlab.com/git-vcs/git/
With all the integration branches and topics broken out:
https://github.com/gitster/git/
Even though the preformatted documentation in HTML and man format
are not sources, they are published in these repositories for
convenience (replace "htmldocs" with "manpages" for the manual
pages):
git://git.kernel.org/pub/scm/git/git-htmldocs.git/
https://github.com/gitster/git-htmldocs.git/
Release tarballs are available at:
https://www.kernel.org/pub/software/scm/git/
--------------------------------------------------
[Graduated to 'master']
* ab/cache-api-cleanup-users (2023-01-17) 3 commits
(merged to 'next' on 2023-01-18 at c5a4374652)
+ treewide: always have a valid "index_state.repo" member
+ Merge branch 'ds/omit-trailing-hash-in-index' into ab/cache-api-cleanup-users
+ Merge branch 'ab/cache-api-cleanup' into ab/cache-api-cleanup-users
Updates the users of the cache API.
cf. <db312853-81a1-542b-db96-d816c463516c@github.com>
source: <patch-1.1-b4998652822-20230117T135234Z-avarab@gmail.com>
* ar/markup-em-dash (2023-01-23) 1 commit
(merged to 'next' on 2023-01-24 at 0367e3035f)
+ Documentation: render dash correctly
Doc mark-up updates.
source: <20230123090114.429844-1-rybak.a.v@gmail.com>
* cb/grep-pcre-ucp (2023-01-18) 1 commit
(merged to 'next' on 2023-01-19 at 2c7e531839)
+ grep: correctly identify utf-8 characters with \{b,w} in -P
"grep -P" learned to use Unicode Character Property to grok
character classes when processing \b and \w etc.
cf. <xmqqzgaf2zpt.fsf@gitster.g>
source: <20230108155217.2817-1-carenas@gmail.com>
* cw/fetch-remote-group-with-duplication (2023-01-19) 1 commit
(merged to 'next' on 2023-01-20 at 7f00e43209)
+ fetch: fix duplicate remote parallel fetch bug
"git fetch <group>", when "<group>" of remotes lists the same
remote twice, unnecessarily failed when parallel fetching was
enabled, which has been corrected.
source: <20230119220538.1522464-1-calvinwan@google.com>
* jc/doc-branch-update-checked-out-branch (2023-01-18) 1 commit
(merged to 'next' on 2023-01-19 at 970900a232)
+ branch: document `-f` and linked worktree behaviour
Document that "branch -f <branch>" disables only the safety to
avoid recreating an existing branch.
source: <xmqqa62f2dj1.fsf_-_@gitster.g>
* jc/doc-checkout-b (2023-01-19) 1 commit
(merged to 'next' on 2023-01-23 at 95340e1941)
+ checkout: document -b/-B to highlight the differences from "git branch"
Clarify how "checkout -b/-B" and "git branch [-f]" are similar but
different in the documentation.
source: <xmqqtu0m1m9i.fsf@gitster.g>
* jk/hash-object-fsck (2023-01-19) 7 commits
(merged to 'next' on 2023-01-23 at 985e87fc34)
+ fsck: do not assume NUL-termination of buffers
+ hash-object: use fsck for object checks
+ fsck: provide a function to fsck buffer without object struct
+ t: use hash-object --literally when created malformed objects
+ t7030: stop using invalid tag name
+ t1006: stop using 0-padded timestamps
+ t1007: modernize malformed object tests
"git hash-object" now checks that the resulting object is well
formed with the same code as "git fsck".
source: <Y8hX+pIZUKXsyYj5@coredump.intra.peff.net>
source: <Y8ifa7hyqxSbL92U@coredump.intra.peff.net>
* jk/hash-object-literally-fd-leak (2023-01-19) 1 commit
(merged to 'next' on 2023-01-19 at fff9b60a36)
+ hash-object: fix descriptor leak with --literally
Leakfix.
source: <Y8ijpJqtkDTi792i@coredump.intra.peff.net>
* km/send-email-with-v-reroll-count (2022-11-27) 1 commit
(merged to 'next' on 2023-01-19 at 9b3543471c)
+ send-email: relay '-v N' to format-patch
"git send-email -v 3" used to be expanded to "git send-email
--validate 3" when the user meant to pass them down to
"format-patch", which has been corrected.
source: <87edtp5uws.fsf@kyleam.com>
* pb/branch-advice-recurse-submodules (2023-01-18) 1 commit
(merged to 'next' on 2023-01-19 at 13747fc72d)
+ branch: improve advice when --recurse-submodules fails
Improve advice message given when "git branch --recurse-submodules"
fails.
source: <pull.1464.git.1673890908453.gitgitgadget@gmail.com>
* po/pretty-format-columns-doc (2023-01-19) 5 commits
(merged to 'next' on 2023-01-23 at d41cb5f527)
+ doc: pretty-formats note wide char limitations, and add tests
+ doc: pretty-formats describe use of ellipsis in truncation
+ doc: pretty-formats document negative column alignments
+ doc: pretty-formats: delineate `%<|(` parameter values
+ doc: pretty-formats: separate parameters from placeholders
Clarify column-padding operators in the pretty format string.
source: <20230119181827.1319-1-philipoakley@iee.email>
* sa/cat-file-mailmap--batch-check (2023-01-18) 1 commit
(merged to 'next' on 2023-01-18 at 25ecb1dd3a)
+ git-cat-file.txt: fix list continuations rendering literally
Docfix.
source: <20230118082749.1252459-1-martin.agren@gmail.com>
* tb/t0003-invoke-dd-more-portably (2023-01-22) 1 commit
(merged to 'next' on 2023-01-23 at 917aa24a27)
+ t0003: call dd with portable blocksize
Test portability fix.
source: <20230122062839.14542-1-tboegi@web.de>
--------------------------------------------------
[New Topics]
* ds/scalar-ignore-cron-error (2023-01-27) 3 commits
- scalar: only warn when background maintenance fails
- t921*: test scalar behavior starting maintenance
- t: allow 'scalar' in test_must_fail
Allow "scalar" to warn but continue when its periodic maintenance
feature cannot be enabled.
Will merge to 'next'.
source: <pull.1473.git.1674849963.gitgitgadget@gmail.com>
* mh/doc-credential-cache-only-in-core (2023-01-29) 1 commit
(merged to 'next' on 2023-01-30 at 021b5227af)
+ Documentation: clarify that cache forgets credentials if the system restarts
Documentation clarification.
Will merge to 'master'.
source: <pull.1447.v3.git.1674936815117.gitgitgadget@gmail.com>
--------------------------------------------------
[Stalled]
* ja/worktree-orphan (2023-01-13) 4 commits
- worktree add: add hint to direct users towards --orphan
- worktree add: add --orphan flag
- worktree add: refactor opt exclusion tests
- worktree add: include -B in usage docs
'git worktree add' learned how to create a worktree based on an
orphaned branch with `--orphan`.
Expecting a reroll.
cf. <11be1b0e-ee38-119f-1d80-cb818946116b@dunelm.org.uk>
source: <20230109173227.29264-1-jacobabel@nullpo.dev>
* ab/avoid-losing-exit-codes-in-tests (2022-12-20) 6 commits
- tests: don't lose misc "git" exit codes
- tests: don't lose "git" exit codes in "! ( git ... | grep )"
- tests: don't lose exit status with "test <op> $(git ...)"
- tests: don't lose exit status with "(cd ...; test <op> $(git ...))"
- t/lib-patch-mode.sh: fix ignored exit codes
- auto-crlf tests: don't lose exit code in loops and outside tests
Test clean-up.
Expecting a hopefully minor and final reroll.
cf. <1182283a-4a78-3c99-e716-a8c3e58a5823@web.de>
cf. <xmqqsfhb0vum.fsf@gitster.g>
source: <cover-v4-0.6-00000000000-20221219T101240Z-avarab@gmail.com>
* tl/notes--blankline (2022-11-09) 5 commits
- notes.c: introduce "--no-blank-line" option
- notes.c: provide tips when target and append note are both empty
- notes.c: drop unreachable code in 'append_edit()'
- notes.c: cleanup for "designated init" and "char ptr init"
- notes.c: cleanup 'strbuf_grow' call in 'append_edit'
'git notes append' was taught '--[no-]blank-line' to conditionally
add a LF between a new and existing note.
Expecting a reroll.
cf. <CAPig+cRcezSp4Rqt1Y9bD-FT6+7b0g9qHfbGRx65AOnw2FQXKg@mail.gmail.com>
source: <cover.1667980450.git.dyroneteng@gmail.com>
* mc/switch-advice (2022-11-09) 1 commit
- po: use `switch` over `checkout` in error message
Use 'switch' instead of 'checkout' in an error message.
Waiting for review response.
source: <pull.1308.git.git.1668018620148.gitgitgadget@gmail.com>
* js/range-diff-mbox (2022-11-23) 1 commit
- range-diff: support reading mbox files
'git range-diff' gained support for reading either side from an .mbox
file instead of a revision range.
Waiting for review response.
cf. <xmqqr0xupmnf.fsf@gitster.g>
source: <pull.1420.v3.git.1669108102092.gitgitgadget@gmail.com>
* ab/tag-object-type-errors (2022-11-22) 5 commits
- tag: don't emit potentially incorrect "object is a X, not a Y"
- tag: don't misreport type of tagged objects in errors
- object tests: add test for unexpected objects in tags
- object-file.c: free the "t.tag" in check_tag()
- Merge branch 'jk/parse-object-type-mismatch' into ab/tag-object-type-errors
Hardening checks around mismatched object types when one of those
objects is a tag.
Expecting a reroll.
cf. <xmqqzgb5jz5c.fsf@gitster.g>
cf. <xmqqsfgxjugi.fsf@gitster.g>
source: <cover-0.4-00000000000-20221118T113442Z-avarab@gmail.com>
* ab/config-multi-and-nonbool (2022-11-27) 9 commits
- for-each-repo: with bad config, don't conflate <path> and <cmd>
- config API: add "string" version of *_value_multi(), fix segfaults
- config API users: test for *_get_value_multi() segfaults
- for-each-repo: error on bad --config
- config API: have *_multi() return an "int" and take a "dest"
- versioncmp.c: refactor config reading next commit
- config tests: add "NULL" tests for *_get_value_multi()
- config tests: cover blind spots in git_die_config() tests
- for-each-repo tests: test bad --config keys
Assorted config API updates.
Expecting a reroll.
source: <cover-v3-0.9-00000000000-20221125T093158Z-avarab@gmail.com>
* ed/fsmonitor-inotify (2022-12-13) 6 commits
- fsmonitor: update doc for Linux
- fsmonitor: test updates
- fsmonitor: enable fsmonitor for Linux
- fsmonitor: implement filesystem change listener for Linux
- fsmonitor: determine if filesystem is local or remote
- fsmonitor: prepare to share code between Mac OS and Linux
Bundled fsmonitor for Linux using inotify API.
Needs review on the updated round.
source: <pull.1352.v5.git.git.1670882286.gitgitgadget@gmail.com>
* jc/spell-id-in-both-caps-in-message-id (2022-12-17) 1 commit
- e-mail workflow: Message-ID is spelled with ID in both capital letters
Consistently spell "Message-ID" as such, not "Message-Id".
Needs review.
source: <xmqqsfhgnmqg.fsf@gitster.g>
* ad/test-record-count-when-harness-is-in-use (2022-12-25) 1 commit
- test-lib: allow storing counts with test harnesses
Allow summary results from tests to be written to t/test-results
directory even when a test harness like 'prove' is in use.
Needs review.
source: <20221224225200.1027806-1-adam@dinwoodie.org>
* so/diff-merges-more (2022-12-18) 5 commits
- diff-merges: improve --diff-merges documentation
- diff-merges: issue warning on lone '-m' option
- diff-merges: support list of values for --diff-merges
- diff-merges: implement log.diffMerges-m-imply-p config
- diff-merges: implement [no-]hide option and log.diffMergesHide config
Assorted updates to "--diff-merges=X" option.
May want to discard. Breaking compatibility does not seem worth it.
source: <20221217132955.108542-1-sorganov@gmail.com>
--------------------------------------------------
[Cooking]
* ab/hook-api-with-stdin (2023-01-23) 5 commits
- hook: support a --to-stdin=<path> option for testing
- sequencer: use the new hook API for the simpler "post-rewrite" call
- hook API: support passing stdin to hooks, convert am's 'post-rewrite'
- run-command: allow stdin for run_processes_parallel
- run-command.c: remove dead assignment in while-loop
Extend the run-hooks API to allow feeding data from the standard
input when running the hook script(s).
Expecting review responses.
source: <cover-0.5-00000000000-20230123T170550Z-avarab@gmail.com>
* as/ssh-signing-improve-key-missing-error (2023-01-25) 1 commit
(merged to 'next' on 2023-01-25 at 140f2c2c60)
+ ssh signing: better error message when key not in agent
Improve the error message given when private key is not loaded in
the ssh agent in the codepath to sign with an ssh key.
Will merge to 'master'.
source: <pull.1270.v3.git.git.1674650450662.gitgitgadget@gmail.com>
* en/rebase-incompatible-opts (2023-01-25) 10 commits
(merged to 'next' on 2023-01-27 at 35a67cf2c6)
+ rebase: provide better error message for apply options vs. merge config
+ rebase: put rebase_options initialization in single place
+ rebase: fix formatting of rebase --reapply-cherry-picks option in docs
+ rebase: clarify the OPT_CMDMODE incompatibilities
+ rebase: add coverage of other incompatible options
+ rebase: fix incompatiblity checks for --[no-]reapply-cherry-picks
+ rebase: fix docs about incompatibilities with --root
+ rebase: remove --allow-empty-message from incompatible opts
+ rebase: flag --apply and --merge as incompatible
+ rebase: mark --update-refs as requiring the merge backend
"git rebase" often ignored incompatible options instead of
complaining, which has been corrected.
Will merge to 'master'.
Replaces en/rebase-update-refs-needs-merge-backend.
source: <pull.1466.v5.git.1674619434.gitgitgadget@gmail.com>
* gm/request-pull-with-non-pgp-signed-tags (2023-01-25) 1 commit
(merged to 'next' on 2023-01-30 at abc684d8df)
+ request-pull: filter out SSH/X.509 tag signatures
Adjust "git request-pull" to strip embedded signature from signed
tags to notice non-PGP signatures.
Will merge to 'master'.
source: <20230125234725.3918563-1-gwymor@tilde.club>
* cb/grep-fallback-failing-jit (2023-01-30) 2 commits
- SQUASH???
- grep: fall back to interpreter if JIT memory allocation fails
In an environment where dynamically generated code is prohibited to
run (e.g. SELinux), failure to JIT pcre patterns is expected. Fall
back to interpreted execution in such a case.
Expecting a (hopefully final minor) reroll.
cf. <xmqqlelj3hvk.fsf@gitster.g>
source: <20230127154952.485913-1-minipli@grsecurity.net>
* cb/checkout-same-branch-twice (2023-01-20) 1 commit
- checkout/switch: disallow checking out same branch in multiple worktrees
"git checkout -B $branch" failed to protect against checking out
a branch that is checked out elsewhere, unlike "git branch -f" did.
Expecting a (hopefully final) reroll.
cf. <8f24fc3c-c30f-dc70-5a94-5ee4ed3de102@dunelm.org.uk>
source: <20230120113553.24655-1-carenas@gmail.com>
* ab/sequencer-unleak (2023-01-18) 8 commits
- commit.c: free() revs.commit in get_fork_point()
- builtin/rebase.c: free() "options.strategy_opts"
- sequencer.c: always free() the "msgbuf" in do_pick_commit()
- builtin/rebase.c: fix "options.onto_name" leak
- builtin/revert.c: move free-ing of "revs" to replay_opts_release()
- rebase & sequencer API: fix get_replay_opts() leak in "rebase"
- sequencer.c: split up sequencer_remove_state()
- rebase: use "cleanup" pattern in do_interactive_rebase()
Plug leaks in sequencer subsystem and its users.
Expecting a hopefully minor and final reroll.
cf. <xmqqedry17r4.fsf@gitster.g>
source: <cover-v3-0.8-00000000000-20230118T160600Z-avarab@gmail.com>
* jc/attr-doc-fix (2023-01-26) 1 commit
(merged to 'next' on 2023-01-26 at cb327c4b5f)
+ attr: fix instructions on how to check attrs
Comment fix.
Will merge to 'master'.
source: <pull.1441.v3.git.git.1674768107941.gitgitgadget@gmail.com>
* rj/avoid-switching-to-already-used-branch (2023-01-22) 3 commits
- switch: reject if the branch is already checked out elsewhere (test)
- rebase: refuse to switch to a branch already checked out elsewhere (test)
- branch: fix die_if_checked_out() when ignore_current_worktree
A few subcommands have been taught to stop users from working on a
branch that is being used in another worktree linked to the same
repository.
Expecting a (hopefully final) reroll.
cf. <d61a2393-64c8-da49-fe13-00bc4a52d5e3@gmail.com>
source: <f7f45f54-9261-45ea-3399-8ba8dee6832b@gmail.com>
* rj/bisect-already-used-branch (2023-01-22) 1 commit
- bisect: fix "reset" when branch is checked out elsewhere
Allow "git bisect reset [name]" to check out the named branch (or
the original one) even when the branch is already checked out in a
different worktree linked to the same repository.
Leaning negative. Why is it a good thing?
cf. <xmqqo7qqovp1.fsf@gitster.g>
source: <1c36c334-9f10-3859-c92f-3d889e226769@gmail.com>
* en/ls-files-doc-update (2023-01-13) 4 commits
(merged to 'next' on 2023-01-27 at 20b9803add)
+ ls-files: guide folks to --exclude-standard over other --exclude* options
+ ls-files: clarify descriptions of status tags for -t
+ ls-files: clarify descriptions of file selection options
+ ls-files: add missing documentation for --resolve-undo option
Doc update to ls-files.
Will merge to 'master'.
source: <pull.1463.git.1673584914.gitgitgadget@gmail.com>
* ms/send-email-feed-header-to-validate-hook (2023-01-19) 2 commits
- send-email: expose header information to git-send-email's sendemail-validate hook
- send-email: refactor header generation functions
"git send-email" learned to give the e-mail headers to the validate
hook by passing an extra argument from the command line.
Expecting a (hopefully final) reroll.
cf. <c1ba0a28-3c39-b313-2757-dceb02930334@amd.com>
source: <20230120012459.920932-1-michael.strawbridge@amd.com>
* ds/bundle-uri-5 (2023-01-23) 10 commits
- bundle-uri: test missing bundles with heuristic
- bundle-uri: store fetch.bundleCreationToken
- fetch: fetch from an external bundle URI
- bundle-uri: drop bundle.flag from design doc
- clone: set fetch.bundleURI if appropriate
- bundle-uri: download in creationToken order
- bundle-uri: parse bundle.<id>.creationToken values
- bundle-uri: parse bundle.heuristic=creationToken
- t5558: add tests for creationToken heuristic
- bundle: optionally skip reachability walk
The bundle-URI subsystem adds support for creation-token heuristics
to help incremental fetches.
Expecting a reroll.
cf. <771a2993-85bd-0831-0977-24204f84e206@github.com>
cf. <01f97aff-58a1-ef2c-e668-d37ea513c64e@github.com>
cf. <ecc6b167-f5c4-48ce-3973-461d1659ed40@github.com>
source: <pull.1454.v2.git.1674487310.gitgitgadget@gmail.com>
* tc/cat-file-z-use-cquote (2023-01-16) 1 commit
- cat-file: quote-format name in error when using -z
"cat-file" in the batch mode that is fed NUL-terminated pathnames
learned to cquote them in its error output (otherwise, a funny
pathname with LF in it would break the lines in the output stream).
Expecting a reroll.
cf. <2a2a46f0-a9bc-06a6-72e1-28800518777c@dunelm.org.uk>
source: <20230116190749.4141516-1-toon@iotcl.com>
* cw/submodule-status-in-parallel (2023-01-17) 6 commits
- submodule: call parallel code from serial status
- diff-lib: parallelize run_diff_files for submodules
- diff-lib: refactor match_stat_with_submodule
- submodule: move status parsing into function
- submodule: strbuf variable rename
- run-command: add duplicate_output_fn to run_processes_parallel_opts
"git submodule status" learned to run the comparison in submodule
repositories in parallel.
Expecting a reroll.
cf. <CAFySSZBiW7=ZTmXRaLzCoKUi0Jd=fzvW5PJ6=Ka0jKHoP2ddSw@mail.gmail.com>
cf. <kl6lo7qlvg4h.fsf@chooglen-macbookpro.roam.corp.google.com>
source: <20230104215415.1083526-1-calvinwan@google.com>
* ab/various-leak-fixes (2023-01-18) 19 commits
- push: free_refs() the "local_refs" in set_refspecs()
- receive-pack: free() the "ref_name" in "struct command"
- grep API: plug memory leaks by freeing "header_list"
- grep.c: refactor free_grep_patterns()
- object-file.c: release the "tag" in check_tag()
- builtin/merge.c: free "&buf" on "Your local changes..." error
- builtin/merge.c: use fixed strings, not "strbuf", fix leak
- show-branch: free() allocated "head" before return
- commit-graph: fix a parse_options_concat() leak
- http-backend.c: fix cmd_main() memory leak, refactor reg{exec,free}()
- http-backend.c: fix "dir" and "cmd_arg" leaks in cmd_main()
- worktree: fix a trivial leak in prune_worktrees()
- repack: fix leaks on error with "goto cleanup"
- name-rev: don't xstrdup() an already dup'd string
- various: add missing clear_pathspec(), fix leaks
- clone: use free() instead of UNLEAK()
- commit-graph: use free_commit_graph() instead of UNLEAK()
- bundle.c: don't leak the "args" in the "struct child_process"
- tests: mark tests as passing with SANITIZE=leak
Leak fixes.
Needs review.
source: <cover-v5-00.19-00000000000-20230118T120334Z-avarab@gmail.com>
* rj/branch-unborn-in-other-worktrees (2023-01-19) 3 commits
- branch: rename orphan branches in any worktree
- branch: description for orphan branch errors
- avoid unnecessary worktrees traversing
Error messages given when working on an unborn branch that is
checked out in another worktree have been improvved.
Expecting a reroll.
cf. <527f7315-be7b-7ec0-04fc-d07da7d4fefa@gmail.com>
source: <34a58449-4f2e-66ef-ea01-119186aebd23@gmail.com>
* mc/credential-helper-auth-headers (2023-01-20) 12 commits
(merged to 'next' on 2023-01-25 at cb95006bb2)
+ credential: add WWW-Authenticate header to cred requests
+ http: read HTTP WWW-Authenticate response headers
+ http: replace unsafe size_t multiplication with st_mult
+ test-http-server: add sending of arbitrary headers
+ test-http-server: add simple authentication
+ test-http-server: pass Git requests to http-backend
+ test-http-server: add HTTP request parsing
+ test-http-server: add HTTP error response function
+ test-http-server: add stub HTTP server test helper
+ daemon: rename some esoteric/laboured terminology
+ daemon: libify child process handling functions
+ daemon: libify socket setup and option functions
Extending credential helper protocol.
Will kick out of 'next'. The test-only server is an eyesore.
cf. <e57c1ca3-c21c-db41-a386-e5887f46055c@github.com>
cf. <Y9JkMLueCwjkLHOr@coredump.intra.peff.net>
source: <pull.1352.v7.git.1674252530.gitgitgadget@gmail.com>
--------------------------------------------------
[Discarded]
* jc/ci-deprecated-declarations-are-not-fatal (2023-01-14) 1 commit
(merged to 'next' on 2023-01-14 at 5efb778ab0)
+ ci: do not die on deprecated-declarations warning
CI build fix for overzealous -Werror.
Reverted out of 'next'
Preferring jk/curl-avoid-deprecated-api that fixes the code properly.
source: <xmqq7cxpkpjp.fsf@gitster.g>
* po/pretty-hard-trunc (2022-11-13) 1 commit
. pretty-formats: add hard truncation, without ellipsis, options
Add a new pretty format which truncates without ellipsis.
Superseded by the 'po/pretty-format-columns-doc' topic.
source: <20221112143616.1429-1-philipoakley@iee.email>
* en/rebase-update-refs-needs-merge-backend (2023-01-22) 9 commits
(merged to 'next' on 2023-01-23 at 1b65346647)
+ rebase: provide better error message for apply options vs. merge config
+ rebase: put rebase_options initialization in single place
+ rebase: fix formatting of rebase --reapply-cherry-picks option in docs
+ rebase: clarify the OPT_CMDMODE incompatibilities
+ rebase: add coverage of other incompatible options
+ rebase: fix docs about incompatibilities with --root
+ rebase: remove --allow-empty-message from incompatible opts
+ rebase: flag --apply and --merge as incompatible
+ rebase: mark --update-refs as requiring the merge backend
The "--update-refs" feature of "git rebase" requires the use of the
merge backend, while "--whitespace=fix" feature does not work with
the said backend. Notice the combination and error out, instead of
silently ignoring one of the features requested.
Reverted out of 'next' to be replaced with en/rebase-incompatible-opts
source: <pull.1466.v4.git.1674367961.gitgitgadget@gmail.com>
* rs/tree-parse-mode-overflow-check (2023-01-21) 1 commit
. tree-walk: disallow overflowing modes
Reject tree objects with entries whose mode bits are overly wide.
Retracted.
cf. <b4b48877-5b80-e96f-d09f-2fe275f42950@web.de>
source: <d673fde7-7eb2-6306-86b6-1c1a4c988ee8@web.de>
* cc/filtered-repack (2022-12-25) 3 commits
. gc: add gc.repackFilter config option
. repack: add --filter=<filter-spec> option
. pack-objects: allow --filter without --stdout
"git repack" learns to discard objects that ought to be retrievable
again from the promisor remote.
May want to discard. Its jaggy edges may be a bit too sharp.
cf. <Y7WTv19aqiFCU8au@ncase>
source: <20221221040446.2860985-1-christian.couder@gmail.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox