From: Derrick Stolee <stolee@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>,
Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com,
johannes.schindelin@gmx.de, peff@peff.net, ps@pks.im,
me@ttaylorr.com, johncai86@gmail.com, newren@gmail.com,
christian.couder@gmail.com, kristofferhaugsbakk@fastmail.com
Subject: Re: [PATCH 08/17] pack-objects: add --path-walk option
Date: Wed, 30 Oct 2024 22:12:38 -0400 [thread overview]
Message-ID: <59c0b17a-491e-48ed-840e-7eec1c2d1de2@gmail.com> (raw)
In-Reply-To: <20241028195404.4119175-1-jonathantanmy@google.com>
On 10/28/24 3:54 PM, Jonathan Tan wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> This new walk is incompatible with some features and is ignored by
>> others:
>>
>> * Object filters are not currently integrated with the path-walk API,
>> such as sparse-checkout or tree depth. A blobless packfile could be
>> integrated easily, but that is deferred for later.
>>
>> * Server-focused features such as delta islands, shallow packs, and
>> using a bitmap index are incompatible with the path-walk API.
>>
>> * The path walk API is only compatible with the --revs option, not
>> taking object lists or pack lists over stdin. These alternative ways
>> to specify the objects currently ignores the --path-walk option
>> without even a warning.
>
> It might be better to declare --path-walk as "internal use only" and
> only supporting what send-pack.c (used by "git push") and "git repack"
> needs. (From this list, it seems that there is a lot of incompatibility,
> some of which can happen without a warning to the user, so it sounds
> better to be up-front and say that we only support what send-pack.c
> needs. This also makes reviewing easier, as we don't have to think about
> the possible interactions with every other rev-list feature - only what
> is used by send-pack.c.)
I do wonder what the value of doing this would be. I consider 'gitpack-objects'
to already be a plumbing command, so marking any option
as "internal use only" seems like overkill. It takes effort to combine
the options carefully for the right effect. The tests in p5313 are not
terribly simple, such as needing --no-reuse-delta to guarantee we are
using the desired delta algorithm.
> Also from a reviewer perspective, it might be better to restrict this
> patch set to what send-pack.c needs and leave "git repack" for a future
> patch set. This means that we would not need features such as blob
> and tree exclusions, and possibly not even bitmap use or delta reuse
> (assuming that the user would typically push recently-created objects
> that have not been repacked).
While I can understand that as being a potential place to split the
patch series, the integration to add 'git repack --path-walk' is actually
very simple. Repacking "everything" needs to happen to be able to push a
repo to an empty remote, after all.
There are some subtleties around indexed objects, reflogs, and the like
that add some complexity, but they also are handled in the path-walk API
layer. Some of that complexity was helpful to know about during repack
tests.
Finally, the 'git repack --path-walk' use case is a great one for
demonstrating the benefits to threading the 'git pack-objects
--path-walk' algorithm.
>> + /* Skip objects that do not exist locally. */
>> + if (exclude_promisor_objects &&
>> + oid_object_info_extended(the_repository, oid, &oi,
>> + OBJECT_INFO_FOR_PREFETCH) < 0)
>> + continue;
>
> This functionality is typically triggered by --missing=allow;
> --exclude_promisor_objects means (among other things) that we allow
> a missing link only if that object is known to be a promisor object
> (because another promisor object refers to it) (see Documentation/
> rev-list-options.txt, and also get_reference() and elsewhere in
> revision.c - notice how is_promisor_object() is paired with it)
>
> Having said that, we should probably just fail outright on missing
> objects, whether or not we have exclude_promisor_objects. If we have
> computed that we need to push an object, that object needs to exist.
> (Same for repack.)
I think that this is not a reasonable assumption that a hard fail
should be expected. Someone could create a blobless clone with a
sparse-checkout and then add a new file outside their sparse-checkout
without ever having the previous version downloaded.
When pushing, they won't be able to use the previous version as a
delta base, but they would certainly be confused about an object
being downloaded during 'git push'.
While the example I bring up is somewhat contrived, I can easily
imagine cases where missing objects are part of the commit boundary
and could be marked as UNINTERESTING but would still be sent in the
batch to be considered as a delta base.
Thanks,
-Stolee
next prev parent reply other threads:[~2024-10-31 2:12 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-08 14:11 [PATCH 00/17] pack-objects: add --path-walk option for better deltas Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 01/17] path-walk: introduce an object walk by path Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 02/17] t6601: add helper for testing path-walk API Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 03/17] path-walk: allow consumer to specify object types Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 04/17] path-walk: allow visiting tags Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 05/17] revision: create mark_trees_uninteresting_dense() Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 06/17] path-walk: add prune_all_uninteresting option Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 07/17] pack-objects: extract should_attempt_deltas() Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 08/17] pack-objects: add --path-walk option Derrick Stolee via GitGitGadget
2024-10-28 19:54 ` Jonathan Tan
2024-10-29 18:07 ` Taylor Blau
2024-10-29 21:36 ` Jonathan Tan
2024-10-29 22:16 ` Taylor Blau
2024-10-31 2:04 ` Derrick Stolee
2024-10-31 2:14 ` Derrick Stolee
2024-10-31 21:02 ` Taylor Blau
2024-10-31 2:12 ` Derrick Stolee [this message]
2024-10-08 14:11 ` [PATCH 09/17] pack-objects: update usage to match docs Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 10/17] p5313: add performance tests for --path-walk Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 11/17] pack-objects: introduce GIT_TEST_PACK_PATH_WALK Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 12/17] repack: add --path-walk option Derrick Stolee via GitGitGadget
2024-10-08 14:11 ` [PATCH 13/17] repack: update usage to match docs Derrick Stolee via GitGitGadget
2024-10-08 14:12 ` [PATCH 14/17] pack-objects: enable --path-walk via config Derrick Stolee via GitGitGadget
2024-10-08 14:12 ` [PATCH 15/17] scalar: enable path-walk during push " Derrick Stolee via GitGitGadget
2024-10-08 14:12 ` [PATCH 16/17] pack-objects: refactor path-walk delta phase Derrick Stolee via GitGitGadget
2024-10-08 14:12 ` [PATCH 17/17] pack-objects: thread the path-based compression Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 00/17] pack-objects: add --path-walk option for better deltas Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 01/17] path-walk: introduce an object walk by path Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 02/17] t6601: add helper for testing path-walk API Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 03/17] path-walk: allow consumer to specify object types Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 04/17] path-walk: allow visiting tags Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 05/17] revision: create mark_trees_uninteresting_dense() Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 06/17] path-walk: add prune_all_uninteresting option Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 07/17] pack-objects: extract should_attempt_deltas() Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 08/17] pack-objects: add --path-walk option Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 09/17] pack-objects: update usage to match docs Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 10/17] p5313: add performance tests for --path-walk Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 11/17] pack-objects: introduce GIT_TEST_PACK_PATH_WALK Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 12/17] repack: add --path-walk option Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 13/17] repack: update usage to match docs Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 14/17] pack-objects: enable --path-walk via config Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 15/17] scalar: enable path-walk during push " Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 16/17] pack-objects: refactor path-walk delta phase Derrick Stolee via GitGitGadget
2024-10-20 13:43 ` [PATCH v2 17/17] pack-objects: thread the path-based compression Derrick Stolee via GitGitGadget
2024-10-21 21:43 ` [PATCH v2 00/17] pack-objects: add --path-walk option for better deltas Taylor Blau
2024-10-24 13:29 ` Derrick Stolee
2024-10-24 15:52 ` Taylor Blau
2024-10-28 5:46 ` Patrick Steinhardt
2024-10-28 16:47 ` Taylor Blau
2024-10-28 17:13 ` Derrick Stolee
2024-10-28 17:25 ` Taylor Blau
2024-10-28 19:46 ` Derrick Stolee
2024-10-29 18:02 ` Taylor Blau
2024-10-31 2:28 ` Derrick Stolee
2024-10-31 21:07 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=59c0b17a-491e-48ed-840e-7eec1c2d1de2@gmail.com \
--to=stolee@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=johncai86@gmail.com \
--cc=jonathantanmy@google.com \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).