From: Junio C Hamano <gitster@pobox.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, me@ttaylorr.com, vdye@github.com,
avarab@gmail.com, steadmon@google.com, chooglen@google.com,
Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH v3 02/11] bundle: verify using check_connected()
Date: Tue, 31 Jan 2023 09:35:24 -0800 [thread overview]
Message-ID: <xmqqr0vay5cz.fsf@gitster.g> (raw)
In-Reply-To: <20c29d37f9c1ba1367145331d25dd27f966312cd.1675171759.git.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Tue, 31 Jan 2023 13:29:10 +0000")
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Thus, the code in verify_bundle() has previously had the additional
> check that all prerequisite commits are reachable from repository
> references. This is done via a revision walk from all references,
> stopping only if all prerequisite commits are discovered or all commits
> are walked. This uses a custom walk to verify_bundle().
>
> This check is more strict than what Git applies to fetched pack-files.
> In the fetch case, Git guarantees that the new references are closed
> under reachability by walking from the new references until walking
> commits that are reachable from repository refs. This is done through
> the well-used check_connected() method.
>
> To better align with the restrictions required by 'git fetch',
> reimplement this check in verify_bundle() to use check_connected(). This
> also simplifies the code significantly.
As I often say, breaking repository faster is not the kind of
performance gain we want to have in Git, and I am in favor of this
iteration compared to the earlier version that mostly punted on
ensuring the correctness (rather, it relied on assumption that "most
of the time this should be OK").
> bundle.c | 75 ++++++++++++++++--------------------------
> t/t6020-bundle-misc.sh | 8 ++---
> 2 files changed, 33 insertions(+), 50 deletions(-)
The diffstat is very pleasing to see.
Let me read the postimage along aloud (preimage omitted).
> diff --git a/bundle.c b/bundle.c
> index 4ef7256aa11..76c3a904898 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -187,6 +188,21 @@ static int list_refs(struct string_list *r, int argc, const char **argv)
> /* Remember to update object flag allocation in object.h */
> #define PREREQ_MARK (1u<<16)
>
> +struct string_list_iterator {
> + struct string_list *list;
> + size_t cur;
> +};
> +
> +static const struct object_id *iterate_ref_map(void *cb_data)
> +{
> + struct string_list_iterator *iter = cb_data;
> +
> + if (iter->cur >= iter->list->nr)
> + return NULL;
> +
> + return iter->list->items[iter->cur++].util;
> +}
This is to let check_connected() collect all the prerequisite object
names. OK.
> int verify_bundle(struct repository *r,
> struct bundle_header *header,
> enum verify_bundle_flags flags)
> {
> /*
> * Do fast check, then if any prereqs are missing then go line by line
> * to be verbose about the errors
> */
> struct string_list *p = &header->prerequisites;
> + int i, ret = 0;
> const char *message = _("Repository lacks these prerequisite commits:");
> + struct string_list_iterator iter = {
> + .list = p,
> + };
> + struct check_connected_options opts = {
> + .quiet = 1,
> + };
>
> if (!r || !r->objects || !r->objects->odb)
> return error(_("need a repository to verify a bundle"));
>
> for (i = 0; i < p->nr; i++) {
> struct string_list_item *e = p->items + i;
> const char *name = e->string;
> struct object_id *oid = e->util;
> struct object *o = parse_object(r, oid);
> + if (o)
> continue;
> ret++;
> if (flags & VERIFY_BUNDLE_QUIET)
> continue;
> if (ret == 1)
> error("%s", message);
> error("%s %s", oid_to_hex(oid), name);
> }
> + if (ret)
> goto cleanup;
The "quick fail" logic as before. Looking sensible.
>
> + if ((ret = check_connected(iterate_ref_map, &iter, &opts)))
> + error(_("some prerequisite commits exist in the object store, "
> + "but are not connected to the repository's history"));
And then we let check_connected() to ensure that traversing from
these prerequisite objects down to the DAG formed by existing refs
will not die from missing objects. Makes sense.
> + /* TODO: preserve this verbose language. */
I am lost -- aren't we preserving the BUNDLE_VERBOSE code below?
> if (flags & VERIFY_BUNDLE_VERBOSE) {
> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
> index 38dbbf89155..7d40994991e 100755
> --- a/t/t6020-bundle-misc.sh
> +++ b/t/t6020-bundle-misc.sh
> @@ -595,14 +595,14 @@ test_expect_success 'verify catches unreachable, broken prerequisites' '
> # Verify should fail
> test_must_fail git bundle verify \
> ../clone-from/tip.bundle 2>err &&
> - grep "Could not read $BAD_OID" err &&
> - grep "Failed to traverse parents of commit $TIP_OID" err &&
> + grep "some prerequisite commits .* are not connected" err &&
> + test_line_count = 1 err &&
>
> # Unbundling should fail
> test_must_fail git bundle unbundle \
> ../clone-from/tip.bundle 2>err &&
> - grep "Could not read $BAD_OID" err &&
> - grep "Failed to traverse parents of commit $TIP_OID" err
> + grep "some prerequisite commits .* are not connected" err &&
> + test_line_count = 1 err
> )
> '
Especially with the new test added in the previous step, we know we
are not trading correctness off. Excellent.
I wonder how much the performance hit does this version incur over
the "not safe at all" version and over the "use custom and
stricter-than-needed" version, by the way?
Thanks.
next prev parent reply other threads:[~2023-01-31 17:36 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-06 20:36 [PATCH 0/8] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
2023-01-06 20:36 ` [PATCH 1/8] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-17 18:17 ` Victoria Dye
2023-01-17 21:00 ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 2/8] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-09 2:38 ` Junio C Hamano
2023-01-09 14:20 ` Derrick Stolee
2023-01-17 19:13 ` Victoria Dye
2023-01-06 20:36 ` [PATCH 3/8] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-09 3:08 ` Junio C Hamano
2023-01-09 14:41 ` Derrick Stolee
2023-01-17 19:24 ` Victoria Dye
2023-01-06 20:36 ` [PATCH 4/8] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-09 3:22 ` Junio C Hamano
2023-01-09 14:58 ` Derrick Stolee
2023-01-19 18:32 ` Victoria Dye
2023-01-20 14:56 ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 5/8] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-19 19:42 ` Victoria Dye
2023-01-20 15:42 ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 6/8] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-19 19:44 ` Victoria Dye
2023-01-06 20:36 ` [PATCH 7/8] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-19 20:34 ` Victoria Dye
2023-01-20 15:47 ` Derrick Stolee
2023-01-06 20:36 ` [PATCH 8/8] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-19 22:24 ` Victoria Dye
2023-01-20 15:53 ` Derrick Stolee
2023-01-23 15:21 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Derrick Stolee via GitGitGadget
2023-01-23 15:21 ` [PATCH v2 01/10] bundle: optionally skip reachability walk Derrick Stolee via GitGitGadget
2023-01-23 18:03 ` Junio C Hamano
2023-01-23 18:24 ` Derrick Stolee
2023-01-23 20:13 ` Junio C Hamano
2023-01-23 22:30 ` Junio C Hamano
2023-01-24 12:27 ` Derrick Stolee
2023-01-24 14:14 ` [PATCH v2.5 01/11] bundle: test unbundling with incomplete history Derrick Stolee
2023-01-24 17:16 ` Junio C Hamano
2023-01-24 14:16 ` [PATCH v2.5 02/11] bundle: verify using connected() Derrick Stolee
2023-01-24 17:33 ` Junio C Hamano
2023-01-24 18:46 ` Derrick Stolee
2023-01-24 20:41 ` Junio C Hamano
2023-01-24 15:22 ` [PATCH v2 01/10] bundle: optionally skip reachability walk Junio C Hamano
2023-01-23 21:08 ` Junio C Hamano
2023-01-23 15:21 ` [PATCH v2 02/10] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-27 19:15 ` Victoria Dye
2023-01-23 15:21 ` [PATCH v2 03/10] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-23 15:21 ` [PATCH v2 04/10] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-23 15:21 ` [PATCH v2 05/10] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-27 19:17 ` Victoria Dye
2023-01-27 19:32 ` Junio C Hamano
2023-01-30 18:43 ` Derrick Stolee
2023-01-30 19:02 ` Junio C Hamano
2023-01-30 19:12 ` Derrick Stolee
2023-01-23 15:21 ` [PATCH v2 06/10] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-23 15:21 ` [PATCH v2 07/10] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-23 15:21 ` [PATCH v2 08/10] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-27 19:18 ` Victoria Dye
2023-01-23 15:21 ` [PATCH v2 09/10] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-23 15:21 ` [PATCH v2 10/10] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
2023-01-27 19:21 ` Victoria Dye
2023-01-30 18:47 ` Derrick Stolee
2023-01-27 19:28 ` [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches Victoria Dye
2023-01-31 13:29 ` [PATCH v3 00/11] " Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 01/11] bundle: test unbundling with incomplete history Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 02/11] bundle: verify using check_connected() Derrick Stolee via GitGitGadget
2023-01-31 17:35 ` Junio C Hamano [this message]
2023-01-31 19:31 ` Derrick Stolee
2023-01-31 19:36 ` Junio C Hamano
2023-01-31 13:29 ` [PATCH v3 03/11] t5558: add tests for creationToken heuristic Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 04/11] bundle-uri: parse bundle.heuristic=creationToken Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 05/11] bundle-uri: parse bundle.<id>.creationToken values Derrick Stolee via GitGitGadget
2023-01-31 21:22 ` Junio C Hamano
2023-01-31 13:29 ` [PATCH v3 06/11] bundle-uri: download in creationToken order Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 07/11] clone: set fetch.bundleURI if appropriate Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 08/11] bundle-uri: drop bundle.flag from design doc Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 09/11] fetch: fetch from an external bundle URI Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 10/11] bundle-uri: store fetch.bundleCreationToken Derrick Stolee via GitGitGadget
2023-01-31 13:29 ` [PATCH v3 11/11] bundle-uri: test missing bundles with heuristic Derrick Stolee via GitGitGadget
2023-01-31 22:01 ` [PATCH v3 00/11] Bundle URIs V: creationToken heuristic for incremental fetches Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqr0vay5cz.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=avarab@gmail.com \
--cc=chooglen@google.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=me@ttaylorr.com \
--cc=steadmon@google.com \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.