From: Derrick Stolee <stolee@gmail.com>
To: Elijah Newren <newren@gmail.com>,
Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>,
Victoria Dye <vdye@github.com>,
Derrick Stolee <derrickstolee@github.com>,
Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 2/2] ls-files: add --sparse option
Date: Mon, 22 Nov 2021 14:44:20 -0500 [thread overview]
Message-ID: <191b2db8-1be4-d6a4-1cf6-eaefaf373dd2@gmail.com> (raw)
In-Reply-To: <CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com>
On 11/22/2021 1:36 PM, Elijah Newren wrote:
> On Tue, Nov 16, 2021 at 7:38 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Existing callers to 'git ls-files' are expecting file names, not
>> directories. It is best to expand a sparse index to show all of the
>> contained files in this case.
>>
>> However, expert users may want to inspect the contents of the index
>> itself including which directories are sparse. Add a --sparse option to
>> allow users to request this information.
>>
>> During testing, I noticed that options such as --modified did not affect
>> the output when the files in question were outside the sparse-checkout
>> definition. Tests are added to document this preexisting behavior and
>> how it remains unchanged with the sparse index and the --sparse option.
>
> Sure, 'git diff' and 'git status' don't show anything for such files
> either; we're being consistent. However, this feels like it's
> normalizing an erroneous condition, possibly starting to cement one of
> the bigger UI problems we have the sparse-checkout (and sparse-index):
>
> I assert that: Having tracked files be SKIP_WORKTREE despite having a
> file present in the working directory is typically an *error*, and one
> that we need to help users prevent/discover/recover-from.
I do think we need to tread carefully here, since some users use
'git update-index --skip-worktree' to mark a file in their
worktree as "ignored" so they can change it from HEAD without it
ever being staged. This is usually independent of the sparse-
checkout feature, so you might want to focus your efforts to paths
that exist in the worktree but are outside of the sparse-checkout
patterns.
> * I've seen users get into this condition by doing the equivalent of
> either 'git sparse-checkout disable' or 'git sparse-checkout set
> $NEW_PATHS' and then hit Ctrl-C in the middle of its operation.
> * Users also just occasionally copy files from other places (maybe
> even `git show REVISION:FILE >FILE`), and potentially edit.
>
> In either of the above cases:
> * There is no command provided for discovering these files; not diff,
> status, ls-files, or anything
> * There is no command provided for correcting the problem; not clean,
> not reset, or anything. They have to do it by hand
> * There are ample opportunities for the error to propagate and grow,
> due to the fact that these files will not be reported or updated.[1]
>
> [1] For example, the user may first either switch to another branch or
> resets to some other commit. Neither of those operations update these
> erroneous present-despite-SKIP_WORKTREE files. And there will still
> be nothing that reports them. But now the files are out-of-date with
> respect to the new commit. Now if the user disables or changes the
> sparse checkout, they suddenly have several "changed" files in their
> working tree -- which they didn't touch. In the best case, they run
> diff or status or something and notice the files and then correct it,
> and just get perplexed at where the changes came from. But with
> enough users, some percentage of them are just going to commit (some
> of) those changes. When someone else asks why they modified those
> files, they'll (correctly, as it turns out) claim they never did and
> that no program on their system did. They might even think those
> files weren't part of the commit and claim that git modified the
> commit behind their back, which would be wrong, but there won't be any
> reasonable logs to check to prove what happened.
>
>
> I know this issue is out of scope for your series here, but the
> addition of testcases that purposely set up an erroneous condition
> makes it feel like we're trying to normalize that situation and not
> treat it like an error, and are perhap undercutting future attempts to
> try to fix it. I'd rather have it explicitly stated that we're
> setting up an erroneous condition in our tests, in order to make sure
> we handle it as best as we can so far -- in a manner in line with diff
> and grep -- but also note that later solutions to this deeper issue
> may affect one or more of these commands.
I appreciate the attention to this scenario, but I also think
that a patch series whose goal is to change how we react when
in this case already needs to make a case for changing the
behavior. Having tests that exist documenting the previous
behavior form a good basis for "it did this before, but now it
will do this," which hopefully further justifies the change.
Personally, I think of _every_ test as "this is what the code
does right now" and the purpose of a test is to show that we
don't change things _unintentionally_. Every test can be changed
if there is sufficient evidence that it should.
>> @@ -838,6 +844,7 @@ test_expect_success 'sparse-index is not expanded' '
>> init_repos &&
>>
>> ensure_not_expanded status &&
>> + ensure_not_expanded ls-files --sparse &&
>
> Do we also want a
> ensure_not_expanded ls-files &&
> ? We don't expect ls-files to write a new index file in any scenario, right?
When the index is sparse, 'git ls-files' will expand before
listing all of the contents, to avoid listing a sparse
directory entry. One way to avoid ensure_full_index() would
be to do trickery with read_tree_at() whenever we find a
sparse directory entry and use the callback to output what
ls-files would have written. However, that's pretty much the
same amount of work as what ensure_full_index() is doing, so
there is likely little benefit to complicating the code for
this reason.
>> ensure_not_expanded commit --allow-empty -m empty &&
>> echo >>sparse-index/a &&
>> ensure_not_expanded commit -a -m a &&
>> @@ -942,6 +949,46 @@ test_expect_success 'sparse index is not expanded: fetch/pull' '
>> ensure_not_expanded pull full base
>> '
>>
>> +test_expect_success 'ls-files' '
>> + init_repos &&
>> +
>> + # Behavior agrees by default. Sparse index is expanded.
>> + test_all_match git ls-files &&
>> +
>> + # With --sparse, the sparse index data changes behavior.
>> + git -C sparse-index ls-files --sparse >sparse-index-out &&
>> + grep "^folder1/\$" sparse-index-out &&
>> + grep "^folder2/\$" sparse-index-out &&
>> +
>> + # With --sparse and no sparse index, nothing changes.
>> + git -C sparse-checkout ls-files --sparse >sparse-checkout-out &&
>> + grep "^folder1/0/0/0\$" sparse-checkout-out &&
>> + ! grep "/\$" sparse-checkout-out &&
>
> I'd be tempted to split the test up to this point from the rest of the test.
Every instance of "init_repos" adds to the cost of this test script,
so when possible I'd err on the side of grouping them. It also keeps
test able to isolate with "--run=1,<N>".
>> +
>> + write_script edit-content <<-\EOF &&
>> + mkdir folder1 &&
>> + echo content >>folder1/a
>> + EOF
>> + run_on_sparse ../edit-content &&
>
> As above, since folder1/a is a tracked file, I'd rather we explicitly
> called out that we're intentionally setting up an erroneous condition.
>
>> +
>> + # ls-files does not notice modified files whose
>> + # cache entries are marked SKIP_WORKTREE.
>
> ...nor does diff, status, etc., as per my lengthy comment from the
> beginning of the email.
I think the existence of this comment points out that "this is
such a strange situation that it requires explanation." A slight
comment change such as
# In this strange situation where a tracked file
# exists in the worktree but is marked with the
# SKIP_WORKTREE bit in the index, Git ignores the
# worktree contents.
That both points out that this case is unusual, but also that
ls-files isn't the only command that does this.
> Other than my concerns about careful messages with erroneous
> conditions (and the minor question about also having a
> ensure_not_expanded without the --sparse flag), the patch looks good
> to me.
Thanks for your careful read!
-Stolee
next prev parent reply other threads:[~2021-11-22 19:44 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-16 15:38 [PATCH 0/2] Sparse index: fetch, pull, ls-files Derrick Stolee via GitGitGadget
2021-11-16 15:38 ` [PATCH 1/2] fetch/pull: use the sparse index Derrick Stolee via GitGitGadget
2021-11-16 15:38 ` [PATCH 2/2] ls-files: add --sparse option Derrick Stolee via GitGitGadget
2021-11-22 18:36 ` Elijah Newren
2021-11-22 19:44 ` Derrick Stolee [this message]
2021-11-23 2:07 ` Ævar Arnfjörð Bjarmason
2021-12-08 15:14 ` Derrick Stolee
2021-12-08 15:20 ` Derrick Stolee
2021-12-08 17:04 ` Elijah Newren
2021-12-08 18:23 ` Derrick Stolee
2021-12-08 18:36 ` Elijah Newren
2021-12-08 19:06 ` Derrick Stolee
2021-12-09 12:50 ` Ævar Arnfjörð Bjarmason
2021-12-10 13:57 ` Derrick Stolee
2021-12-10 15:13 ` Ævar Arnfjörð Bjarmason
2021-12-13 19:16 ` Junio C Hamano
2021-12-16 14:11 ` Derrick Stolee
2021-11-17 9:29 ` [PATCH 0/2] Sparse index: fetch, pull, ls-files Junio C Hamano
2021-11-17 15:28 ` Derrick Stolee
2021-11-18 22:13 ` Junio C Hamano
2021-11-23 1:57 ` Ævar Arnfjörð Bjarmason
2021-12-08 19:39 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
2021-12-08 19:39 ` [PATCH v2 1/5] fetch/pull: use the sparse index Derrick Stolee via GitGitGadget
2021-12-08 19:39 ` [PATCH v2 2/5] ls-files: add --sparse option Derrick Stolee via GitGitGadget
2021-12-09 5:08 ` Elijah Newren
2021-12-10 13:51 ` Derrick Stolee
2021-12-08 19:39 ` [PATCH v2 3/5] t1092: replace 'read-cache --table' with 'ls-files --sparse' Derrick Stolee via GitGitGadget
2021-12-09 5:19 ` Elijah Newren
2021-12-08 19:39 ` [PATCH v2 4/5] t1091/t3705: remove 'test-tool read-cache --table' Derrick Stolee via GitGitGadget
2021-12-09 5:20 ` Elijah Newren
2021-12-08 19:39 ` [PATCH v2 5/5] test-read-cache: remove --table, --expand options Derrick Stolee via GitGitGadget
2021-12-09 5:23 ` [PATCH v2 0/5] Sparse index: fetch, pull, ls-files Elijah Newren
2021-12-10 15:13 ` [PATCH v3 " Derrick Stolee via GitGitGadget
2021-12-10 15:13 ` [PATCH v3 1/5] fetch/pull: use the sparse index Derrick Stolee via GitGitGadget
2021-12-10 15:13 ` [PATCH v3 2/5] ls-files: add --sparse option Derrick Stolee via GitGitGadget
2021-12-10 15:13 ` [PATCH v3 3/5] t1092: replace 'read-cache --table' with 'ls-files --sparse' Derrick Stolee via GitGitGadget
2021-12-10 15:13 ` [PATCH v3 4/5] t1091/t3705: remove 'test-tool read-cache --table' Derrick Stolee via GitGitGadget
2021-12-10 15:13 ` [PATCH v3 5/5] test-read-cache: remove --table, --expand options Derrick Stolee via GitGitGadget
2021-12-10 16:16 ` [PATCH v3 0/5] Sparse index: fetch, pull, ls-files Ævar Arnfjörð Bjarmason
2021-12-10 18:45 ` Elijah Newren
2021-12-11 2:24 ` Ævar Arnfjörð Bjarmason
2021-12-11 4:45 ` Elijah Newren
2021-12-10 18:53 ` Elijah Newren
2021-12-22 14:20 ` [PATCH v4 " Derrick Stolee via GitGitGadget
2021-12-22 14:20 ` [PATCH v4 1/5] fetch/pull: use the sparse index Derrick Stolee via GitGitGadget
2021-12-22 14:20 ` [PATCH v4 2/5] ls-files: add --sparse option Derrick Stolee via GitGitGadget
2021-12-22 14:20 ` [PATCH v4 3/5] t1092: replace 'read-cache --table' with 'ls-files --sparse' Derrick Stolee via GitGitGadget
2021-12-22 14:20 ` [PATCH v4 4/5] t1091/t3705: remove 'test-tool read-cache --table' Derrick Stolee via GitGitGadget
2021-12-22 14:20 ` [PATCH v4 5/5] test-read-cache: remove --table, --expand options Derrick Stolee via GitGitGadget
2021-12-22 19:17 ` [PATCH v4 0/5] Sparse index: fetch, pull, ls-files Elijah Newren
2021-12-22 23:56 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=191b2db8-1be4-d6a4-1cf6-eaefaf373dd2@gmail.com \
--to=stolee@gmail.com \
--cc=derrickstolee@github.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=vdye@github.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).