From: Victoria Dye <vdye@github.com>
To: William Sprent via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org
Cc: "Eric Sunshine" <sunshine@sunshineco.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Derrick Stolee" <derrickstolee@github.com>,
"Elijah Newren" <newren@gmail.com>,
"William Sprent" <williams@unity3d.com>
Subject: Re: [PATCH v2] ls-tree: add --sparse-filter-oid argument
Date: Tue, 24 Jan 2023 12:11:34 -0800 [thread overview]
Message-ID: <7ccf7b17-4448-5ef4-63b1-9073a400e486@github.com> (raw)
In-Reply-To: <pull.1459.v2.git.1674474371817.gitgitgadget@gmail.com>
William Sprent via GitGitGadget wrote:
> From: William Sprent <williams@unity3d.com>
>
> There is currently no way to ask git the question "which files would be
> part of a sparse checkout of commit X with sparse checkout patterns Y".
> One use-case would be that tooling may want know whether sparse checkouts
> of two commits contain the same content even if the full trees differ.
> Another interesting use-case would be for tooling to use in conjunction
> with 'git update-index --index-info'.
These types of use cases align nicely with "Behavior A" as described in
'Documentation/technical/sparse-checkout.txt' [1]. I think adding a
'--scope=(sparse|all)' option to 'ls-trees' would be a good way to make
progress on the goals of that design document as well as serve the needs
described above.
[1] https://lore.kernel.org/git/pull.1367.v4.git.1667714666810.gitgitgadget@gmail.com/
>
> 'rev-list --objects --filter=sparse:oid' comes close, but as rev-list is
> concerned with objects rather than directory trees, it leaves files out
> when the same blob occurs in at two different paths.
>
> It is possible to ask git about the sparse status of files currently in
> the index with 'ls-files -t'. However, this does not work well when the
> caller is interested in another commit, intererested in sparsity
> patterns that aren't currently in '.git/info/sparse-checkout', or when
> working in with bare repo.
I'm a bit confused by your described use cases here, though. Right now,
'sparse-checkout' is a local repo-only optimization (for performance and/or
scoping user workspaces), but you seem to be implying a need for
"sparse-checkout patterns" as a general-purpose data format (like a config
file or 'rebase-todo') that can be used to manually configure command
behavior.
If that is what you're getting at, it seems like a pretty substantial
expansion to the scope of what we consider "sparse checkout". That's not to
say it isn't a good idea - I can definitely imagine tooling where this type
of functionality is useful - just that it warrants careful consideration so
we don't over-complicate the typical sparse-checkout user experience.
>
> To fill this gap, add a new argument to ls-tree '--sparse-filter-oid'
> which takes the object id of a blob containing sparse checkout patterns
> that filters the output of 'ls-tree'. When filtering with given sparsity
> patterns, 'ls-tree' only outputs blobs and commit objects that
> match the given patterns.
I don't think a blob OID is the best way to specify an arbitrary pattern set
in this case. OIDs will only be created for blobs that are tracked in Git;
it's possible that your project tracks potential sparse-checkout patterns in
Git, but it seems overly restrictive. You could instead specify the file
with a path on-disk (like the '--file' options in various commands, e.g.
'git commit'); if you did need to get that file from the object store, you
could first get its contents with 'git cat-file'.
>
> While it may be valid in some situations to output a tree object -- e.g.
> when a cone pattern matches all blobs recursively contained in a tree --
> it is less unclear what should be output if a sparse pattern matches
> parts of a tree.
>
> To allow for reusing the pattern matching logic found in
> 'path_in_sparse_checkout_1()' in 'dir.c' with arbitrary patterns,
> extract the pattern matching part of the function into its own new
> function 'recursively_match_path_with_sparse_patterns()'.
>
> Signed-off-by: William Sprent <williams@unity3d.com>
> ---
> ls-tree: add --sparse-filter-oid argument
>
> I'm resubmitting this change as rebased on top of 'master', as it
> conflicted with the topic 'ls-tree.c: clean-up works' 1
> [https://public-inbox.org/git/20230112091135.20050-1-tenglong.tl@alibaba-inc.com],
> which was merged to 'master' recently.
>
> This versions also incorporates changes based on the comments made in 2
> [https://public-inbox.org/git/CAPig+cRgZ0CrkqY7mufuWmhf6BC8yXjXXuOTEQjuz+Y0NA+N7Q@mail.gmail.com/].
>
> I'm also looping in contributors that have touched ls-tree and/or
> sparse-checkouts recently. I hope that's okay.
Definitely okay, thanks for CC-ing!
Overall, this is an interesting extension to 'sparse-checkout', and opens up
some opportunities for expanded tooling. However, at an implementation
level, I think it's addressing two separate needs ("constrain 'ls-files' to
display files matching patterns" and "specify sparse patterns not in
'.git/info/sparse-checkout'") in one option, which makes for a somewhat
confusing and inflexible user experience. What about instead breaking this
into two options:
* --scope=(sparse|all): limits the scope of the files output by ("Behavior
A" vs. "Behavior B" from the document linked earlier).
* --sparse-patterns=<file>: specifies "override" patterns to use instead of
those in '.git/info/sparse-checkout' (whether 'sparse-checkout' is enabled
for the repository or not).
I haven't looked at your implementation in detail yet, but I did want to
offer a recommendation in case you hadn't considered it: if you want to
allow the use of patterns from a user-specified specific file, it would be
nice to do it in a way that fully replaces the "default" sparse-checkout
settings at the lowest level (i.e., override the values of
'core_apply_sparse_checkout', 'core_sparse_checkout_cone', and
'get_sparse_checkout_filename()'). Doing it that way would both make it
easier for other commands to add a '--sparse-patterns' option, and avoid the
separate code path ('path_in_sparse_checkout_1()' vs.
'recursively_match_path_with_sparse_patterns()', for example) when dealing
with '.git/info/sparse-checkout' patterns vs. manually-specified patterns.
>
> William
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1459%2Fwilliams-unity%2Fls-tree-sparse-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1459/williams-unity/ls-tree-sparse-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/1459
next prev parent reply other threads:[~2023-01-24 20:11 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-11 17:01 [PATCH] ls-tree: add --sparse-filter-oid argument William Sprent via GitGitGadget
2023-01-13 14:17 ` Eric Sunshine
2023-01-13 20:01 ` Junio C Hamano
2023-01-16 15:13 ` William Sprent
2023-01-16 12:14 ` William Sprent
2023-01-23 11:46 ` [PATCH v2] " William Sprent via GitGitGadget
2023-01-23 13:00 ` Ævar Arnfjörð Bjarmason
2023-01-24 15:30 ` William Sprent
2023-01-23 13:06 ` Ævar Arnfjörð Bjarmason
2023-01-24 15:30 ` William Sprent
2023-01-24 20:11 ` Victoria Dye [this message]
2023-01-25 13:47 ` William Sprent
2023-01-25 18:32 ` Victoria Dye
2023-01-26 14:55 ` William Sprent
2023-01-25 5:11 ` Elijah Newren
2023-01-25 16:16 ` William Sprent
2023-01-26 3:25 ` Elijah Newren
2023-01-27 11:58 ` William Sprent
2023-01-28 16:45 ` Elijah Newren
2023-01-30 15:28 ` William Sprent
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7ccf7b17-4448-5ef4-63b1-9073a400e486@github.com \
--to=vdye@github.com \
--cc=avarab@gmail.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=newren@gmail.com \
--cc=sunshine@sunshineco.com \
--cc=williams@unity3d.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.