From: Karsten Blees <karsten.blees@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Duy Nguyen <pclouds@gmail.com>,
git@vger.kernel.org, kusmabite@gmail.com,
Ramkumar Ramachandra <artagnon@gmail.com>,
Robert Zeh <robert.allan.zeh@gmail.com>,
finnag@pvv.org
Subject: Re: [PATCH] read_directory: avoid invoking exclude machinery on tracked files
Date: Mon, 18 Feb 2013 17:42:17 +0100 [thread overview]
Message-ID: <512259E9.6070703@gmail.com> (raw)
In-Reply-To: <7vd2w1gyok.fsf@alter.siamese.dyndns.org>
Am 15.02.2013 20:32, schrieb Junio C Hamano:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>> On Fri, Feb 15, 2013 at 11:52 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> In the current code, we always check if a path is excluded, and when
>>> dealing with DT_REG/DT_LNK, we call treat_file():
>>>
>>> * When such a path is excluded, treat_file() returns true when we
>>> are not showing ignored directories. This causes treat_one_path()
>>> to return path_ignored, so for excluded DT_REG/DT_LNK paths when
>>> no DIR_*_IGNORED is in effect, this change is a correct
>>> optimization.
>>>
>>> * When such a path is not excluded, on the ther hand, and when we
>>> are not showing ignored directories, treat_file() just returns
>>> the value of exclude_file, which is initialized to false and is
>>> not changed in the function. This causes treat_one_path() to
>>> return path_handled. However, the new code returns path_ignored
>>> in this case.
>>>
>>> What guarantees that this change is regression free?
>>
>> If you consider read_directory_recursive alone, there is a regression.
>> The return value of r_d_r depends on path_handled/path_ignored. With
>> this patch, the return value will be different.
>
> That is exactly what was missing from the proposed log message, and
> made me ask "Do all the callers that reach this function in their
> callgraph, when they get path_ignored for a path in the index,
> behave as if the difference between path_ignored and path_handled
> does not matter?" Your answer seems to be
>
> - r-d-r returns 'how many paths in this directory match the
> criteria we are looking for', unless check_only is true. Now in
> some cases we return path_ignored not path_handled, so we may
> return a number that is greater than we used to return.
>
> - treat_directory, the only user of that return value, cares if
> r-d-r returned 0 or non-zero; and
>
> - As long as we keep returning 0 from r-d-r in cases we used to
> return 0 and non-zero in cases we used to return non-zero, exact
> number does not matter. Overall we get the same result.
>
> I think all of the above is true, but I have not convinced myself
> that r-d-r with the new code never returns 0 when we used to return
> non-zero.
>
treat_directory calls read_directory_recursive in tow cases:
1.) The directory is not in the index.
---8<---
switch (directory_exists_in_index(dirname, len-1)) {
case index_nonexistent:
if (dir->flags & DIR_SHOW_OTHER_DIRECTORIES)
break;
}
...
---8<---
The directory is not in the index if there are no tracked files in the directory. I.e. cache_name_exists will always be false in this case, so the change won't affect the result of r_d_r.
2.) The directory is in the index but is ignored.
---8<---
switch (directory_exists_in_index(dirname, len-1)) {
case index_directory:
if ((dir->flags & DIR_SHOW_OTHER_DIRECTORIES) && exclude)
break;
}
if ((dir->flags & DIR_SHOW_IGNORED) && !exclude) {
...
}
if (!(dir->flags & DIR_SHOW_IGNORED) &&
!(dir->flags & DIR_HIDE_EMPTY_DIRECTORIES))
return show_directory;
if (!read_directory_recursive(dir, dirname, len, 1, simplify))
return ignore_directory;
return show_directory;
---8<---
With exclude==true, only one r_d_r call is reachable, and only if either DIR_SHOW_IGNORED or DIR_HIDE_EMPTY_DIRECTORIES is set.
2a) DIR_SHOW_IGNORED is set: the patch already checks !(dir->flags & DIR_SHOW_IGNORED), so the result of r_d_r is not affected.
2b) DIR_HIDE_EMPTY_DIRECTORIES is set and DIR_SHOW_IGNORED is not set: the directory is already ignored, so all files in the directory should be ignored, too. It doesn't matter whether treat_one_path returns path_ignored because of the excluded() check or cache_name_exists().
Therefore, I think the patch (v0) is regression-free.
As a side note, I'm quite confused why we would ever want to evaluate .gitignore patterns on tracked files at all, as gitignore(5) clearly states "Files already tracked by git are not affected". There is 'git-ls-files --cached --ignored', although this doesn't seem to process .gitignore files but expects exclude patterns on the command line...
next prev parent reply other threads:[~2013-02-18 16:42 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-15 14:17 [PATCH] read_directory: avoid invoking exclude machinery on tracked files Nguyễn Thái Ngọc Duy
2013-02-15 16:52 ` Junio C Hamano
2013-02-15 18:30 ` Duy Nguyen
2013-02-15 19:32 ` Junio C Hamano
2013-02-16 3:31 ` Duy Nguyen
2013-02-18 16:42 ` Karsten Blees [this message]
2013-02-16 7:17 ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2013-02-16 18:11 ` Pete Wyckoff
2013-02-17 4:39 ` Duy Nguyen
2013-02-17 15:49 ` Pete Wyckoff
2013-02-17 23:18 ` Junio C Hamano
2013-02-25 22:01 ` [PATCH/RFC] dir.c: Make git-status --ignored even more consistent Karsten Blees
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=512259E9.6070703@gmail.com \
--to=karsten.blees@gmail.com \
--cc=artagnon@gmail.com \
--cc=finnag@pvv.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kusmabite@gmail.com \
--cc=pclouds@gmail.com \
--cc=robert.allan.zeh@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).