git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Elijah Newren <newren@gmail.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Victoria Dye <vdye@github.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 2/2] ls-files: add --sparse option
Date: Mon, 22 Nov 2021 14:44:20 -0500	[thread overview]
Message-ID: <191b2db8-1be4-d6a4-1cf6-eaefaf373dd2@gmail.com> (raw)
In-Reply-To: <CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com>


On 11/22/2021 1:36 PM, Elijah Newren wrote:
> On Tue, Nov 16, 2021 at 7:38 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Existing callers to 'git ls-files' are expecting file names, not
>> directories. It is best to expand a sparse index to show all of the
>> contained files in this case.
>>
>> However, expert users may want to inspect the contents of the index
>> itself including which directories are sparse. Add a --sparse option to
>> allow users to request this information.
>>
>> During testing, I noticed that options such as --modified did not affect
>> the output when the files in question were outside the sparse-checkout
>> definition. Tests are added to document this preexisting behavior and
>> how it remains unchanged with the sparse index and the --sparse option.
> 
> Sure, 'git diff' and 'git status' don't show anything for such files
> either; we're being consistent.  However, this feels like it's
> normalizing an erroneous condition, possibly starting to cement one of
> the bigger UI problems we have the sparse-checkout (and sparse-index):
> 
> I assert that: Having tracked files be SKIP_WORKTREE despite having a
> file present in the working directory is typically an *error*, and one
> that we need to help users prevent/discover/recover-from.

I do think we need to tread carefully here, since some users use
'git update-index --skip-worktree' to mark a file in their
worktree as "ignored" so they can change it from HEAD without it
ever being staged. This is usually independent of the sparse-
checkout feature, so you might want to focus your efforts to paths
that exist in the worktree but are outside of the sparse-checkout
patterns.

> * I've seen users get into this condition by doing the equivalent of
> either 'git sparse-checkout disable' or 'git sparse-checkout set
> $NEW_PATHS' and then hit Ctrl-C in the middle of its operation.
> * Users also just occasionally copy files from other places (maybe
> even `git show REVISION:FILE >FILE`), and potentially edit.
> 
> In either of the above cases:
> * There is no command provided for discovering these files; not diff,
> status, ls-files, or anything
> * There is no command provided for correcting the problem; not clean,
> not reset, or anything.  They have to do it by hand
> * There are ample opportunities for the error to propagate and grow,
> due to the fact that these files will not be reported or updated.[1]
> 
> [1] For example, the user may first either switch to another branch or
> resets to some other commit.  Neither of those operations update these
> erroneous present-despite-SKIP_WORKTREE files.  And there will still
> be nothing that reports them.  But now the files are out-of-date with
> respect to the new commit.  Now if the user disables or changes the
> sparse checkout, they suddenly have several "changed" files in their
> working tree -- which they didn't touch.  In the best case, they run
> diff or status or something and notice the files and then correct it,
> and just get perplexed at where the changes came from.  But with
> enough users, some percentage of them are just going to commit (some
> of) those changes.  When someone else asks why they modified those
> files, they'll (correctly, as it turns out) claim they never did and
> that no program on their system did.  They might even think those
> files weren't part of the commit and claim that git modified the
> commit behind their back, which would be wrong, but there won't be any
> reasonable logs to check to prove what happened.
> 
> 
> I know this issue is out of scope for your series here, but the
> addition of testcases that purposely set up an erroneous condition
> makes it feel like we're trying to normalize that situation and not
> treat it like an error, and are perhap undercutting future attempts to
> try to fix it.  I'd rather have it explicitly stated that we're
> setting up an erroneous condition in our tests, in order to make sure
> we handle it as best as we can so far -- in a manner in line with diff
> and grep -- but also note that later solutions to this deeper issue
> may affect one or more of these commands.

I appreciate the attention to this scenario, but I also think
that a patch series whose goal is to change how we react when
in this case already needs to make a case for changing the
behavior. Having tests that exist documenting the previous
behavior form a good basis for "it did this before, but now it
will do this," which hopefully further justifies the change.

Personally, I think of _every_ test as "this is what the code
does right now" and the purpose of a test is to show that we
don't change things _unintentionally_. Every test can be changed
if there is sufficient evidence that it should.

>> @@ -838,6 +844,7 @@ test_expect_success 'sparse-index is not expanded' '
>>         init_repos &&
>>
>>         ensure_not_expanded status &&
>> +       ensure_not_expanded ls-files --sparse &&
> 
> Do we also want a
>     ensure_not_expanded ls-files &&
> ?  We don't expect ls-files to write a new index file in any scenario, right?

When the index is sparse, 'git ls-files' will expand before
listing all of the contents, to avoid listing a sparse
directory entry. One way to avoid ensure_full_index() would
be to do trickery with read_tree_at() whenever we find a
sparse directory entry and use the callback to output what
ls-files would have written. However, that's pretty much the
same amount of work as what ensure_full_index() is doing, so
there is likely little benefit to complicating the code for
this reason.

>>         ensure_not_expanded commit --allow-empty -m empty &&
>>         echo >>sparse-index/a &&
>>         ensure_not_expanded commit -a -m a &&
>> @@ -942,6 +949,46 @@ test_expect_success 'sparse index is not expanded: fetch/pull' '
>>         ensure_not_expanded pull full base
>>  '
>>
>> +test_expect_success 'ls-files' '
>> +       init_repos &&
>> +
>> +       # Behavior agrees by default. Sparse index is expanded.
>> +       test_all_match git ls-files &&
>> +
>> +       # With --sparse, the sparse index data changes behavior.
>> +       git -C sparse-index ls-files --sparse >sparse-index-out &&
>> +       grep "^folder1/\$" sparse-index-out &&
>> +       grep "^folder2/\$" sparse-index-out &&
>> +
>> +       # With --sparse and no sparse index, nothing changes.
>> +       git -C sparse-checkout ls-files --sparse >sparse-checkout-out &&
>> +       grep "^folder1/0/0/0\$" sparse-checkout-out &&
>> +       ! grep "/\$" sparse-checkout-out &&
> 
> I'd be tempted to split the test up to this point from the rest of the test.

Every instance of "init_repos" adds to the cost of this test script,
so when possible I'd err on the side of grouping them. It also keeps
test able to isolate with "--run=1,<N>".

>> +
>> +       write_script edit-content <<-\EOF &&
>> +       mkdir folder1 &&
>> +       echo content >>folder1/a
>> +       EOF
>> +       run_on_sparse ../edit-content &&
> 
> As above, since folder1/a is a tracked file, I'd rather we explicitly
> called out that we're intentionally setting up an erroneous condition.
> 
>> +
>> +       # ls-files does not notice modified files whose
>> +       # cache entries are marked SKIP_WORKTREE.
> 
> ...nor does diff, status, etc., as per my lengthy comment from the
> beginning of the email.

I think the existence of this comment points out that "this is
such a strange situation that it requires explanation." A slight
comment change such as

	# In this strange situation where a tracked file
	# exists in the worktree but is marked with the
	# SKIP_WORKTREE bit in the index, Git ignores the
	# worktree contents.

That both points out that this case is unusual, but also that
ls-files isn't the only command that does this.
> Other than my concerns about careful messages with erroneous
> conditions (and the minor question about also having a
> ensure_not_expanded without the --sparse flag), the patch looks good
> to me.
 
Thanks for your careful read!
-Stolee

  reply	other threads:[~2021-11-22 19:44 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-16 15:38 [PATCH 0/2] Sparse index: fetch, pull, ls-files Derrick Stolee via GitGitGadget
2021-11-16 15:38 ` [PATCH 1/2] fetch/pull: use the sparse index Derrick Stolee via GitGitGadget
2021-11-16 15:38 ` [PATCH 2/2] ls-files: add --sparse option Derrick Stolee via GitGitGadget
2021-11-22 18:36   ` Elijah Newren
2021-11-22 19:44     ` Derrick Stolee [this message]
2021-11-23  2:07   ` Ævar Arnfjörð Bjarmason
2021-12-08 15:14     ` Derrick Stolee
2021-12-08 15:20       ` Derrick Stolee
2021-12-08 17:04       ` Elijah Newren
2021-12-08 18:23         ` Derrick Stolee
2021-12-08 18:36           ` Elijah Newren
2021-12-08 19:06             ` Derrick Stolee
2021-12-09 12:50               ` Ævar Arnfjörð Bjarmason
2021-12-10 13:57                 ` Derrick Stolee
2021-12-10 15:13                   ` Ævar Arnfjörð Bjarmason
2021-12-13 19:16                   ` Junio C Hamano
2021-12-16 14:11                     ` Derrick Stolee
2021-11-17  9:29 ` [PATCH 0/2] Sparse index: fetch, pull, ls-files Junio C Hamano
2021-11-17 15:28   ` Derrick Stolee
2021-11-18 22:13     ` Junio C Hamano
2021-11-23  1:57 ` Ævar Arnfjörð Bjarmason
2021-12-08 19:39 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
2021-12-08 19:39   ` [PATCH v2 1/5] fetch/pull: use the sparse index Derrick Stolee via GitGitGadget
2021-12-08 19:39   ` [PATCH v2 2/5] ls-files: add --sparse option Derrick Stolee via GitGitGadget
2021-12-09  5:08     ` Elijah Newren
2021-12-10 13:51       ` Derrick Stolee
2021-12-08 19:39   ` [PATCH v2 3/5] t1092: replace 'read-cache --table' with 'ls-files --sparse' Derrick Stolee via GitGitGadget
2021-12-09  5:19     ` Elijah Newren
2021-12-08 19:39   ` [PATCH v2 4/5] t1091/t3705: remove 'test-tool read-cache --table' Derrick Stolee via GitGitGadget
2021-12-09  5:20     ` Elijah Newren
2021-12-08 19:39   ` [PATCH v2 5/5] test-read-cache: remove --table, --expand options Derrick Stolee via GitGitGadget
2021-12-09  5:23   ` [PATCH v2 0/5] Sparse index: fetch, pull, ls-files Elijah Newren
2021-12-10 15:13   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2021-12-10 15:13     ` [PATCH v3 1/5] fetch/pull: use the sparse index Derrick Stolee via GitGitGadget
2021-12-10 15:13     ` [PATCH v3 2/5] ls-files: add --sparse option Derrick Stolee via GitGitGadget
2021-12-10 15:13     ` [PATCH v3 3/5] t1092: replace 'read-cache --table' with 'ls-files --sparse' Derrick Stolee via GitGitGadget
2021-12-10 15:13     ` [PATCH v3 4/5] t1091/t3705: remove 'test-tool read-cache --table' Derrick Stolee via GitGitGadget
2021-12-10 15:13     ` [PATCH v3 5/5] test-read-cache: remove --table, --expand options Derrick Stolee via GitGitGadget
2021-12-10 16:16     ` [PATCH v3 0/5] Sparse index: fetch, pull, ls-files Ævar Arnfjörð Bjarmason
2021-12-10 18:45       ` Elijah Newren
2021-12-11  2:24         ` Ævar Arnfjörð Bjarmason
2021-12-11  4:45           ` Elijah Newren
2021-12-10 18:53     ` Elijah Newren
2021-12-22 14:20     ` [PATCH v4 " Derrick Stolee via GitGitGadget
2021-12-22 14:20       ` [PATCH v4 1/5] fetch/pull: use the sparse index Derrick Stolee via GitGitGadget
2021-12-22 14:20       ` [PATCH v4 2/5] ls-files: add --sparse option Derrick Stolee via GitGitGadget
2021-12-22 14:20       ` [PATCH v4 3/5] t1092: replace 'read-cache --table' with 'ls-files --sparse' Derrick Stolee via GitGitGadget
2021-12-22 14:20       ` [PATCH v4 4/5] t1091/t3705: remove 'test-tool read-cache --table' Derrick Stolee via GitGitGadget
2021-12-22 14:20       ` [PATCH v4 5/5] test-read-cache: remove --table, --expand options Derrick Stolee via GitGitGadget
2021-12-22 19:17       ` [PATCH v4 0/5] Sparse index: fetch, pull, ls-files Elijah Newren
2021-12-22 23:56         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=191b2db8-1be4-d6a4-1cf6-eaefaf373dd2@gmail.com \
    --to=stolee@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).