From: Ben Peart <peartben@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Ben Peart <benpeart@microsoft.com>
Subject: Re: [RFC v1] Add virtual file system settings and hook proc
Date: Wed, 31 Oct 2018 16:12:53 -0400 [thread overview]
Message-ID: <dbc2eb4f-842e-f49a-256f-3a140d801bb0@gmail.com> (raw)
In-Reply-To: <xmqqsh0nyqx9.fsf@gitster-ct.c.googlers.com>
On 10/30/2018 7:07 PM, Junio C Hamano wrote:
> Ben Peart <peartben@gmail.com> writes:
>
>> diff --git a/config.c b/config.c
>> index 4051e38823..96e05ee0f1 100644
>> --- a/config.c
>> +++ b/config.c
>> ...
>> @@ -2307,6 +2311,37 @@ int git_config_get_index_threads(void)
>> return 0; /* auto */
>> }
>>
>> +int git_config_get_virtualfilesystem(void)
>> +{
>> + if (git_config_get_pathname("core.virtualfilesystem", &core_virtualfilesystem))
>> + core_virtualfilesystem = getenv("GIT_TEST_VIRTUALFILESYSTEM");
>> +
>> + if (core_virtualfilesystem && !*core_virtualfilesystem)
>> + core_virtualfilesystem = NULL;
>> +
>> + if (core_virtualfilesystem) {
>> + /*
>> + * Some git commands spawn helpers and redirect the index to a different
>> + * location. These include "difftool -d" and the sequencer
>> + * (i.e. `git rebase -i`, `git cherry-pick` and `git revert`) and others.
>> + * In those instances we don't want to update their temporary index with
>> + * our virtualization data.
>> + */
>> + char *default_index_file = xstrfmt("%s/%s", the_repository->gitdir, "index");
>> + int should_run_hook = !strcmp(default_index_file, the_repository->index_file);
>> +
>> + free(default_index_file);
>> + if (should_run_hook) {
>> + /* virtual file system relies on the sparse checkout logic so force it on */
>> + core_apply_sparse_checkout = 1;
>> + return 1;
>> + }
>> + core_virtualfilesystem = NULL;
>
> It would be a small leak if this came from config_get_pathname(),
> but if it came from $GIT_TEST_VFS env, we cannot free it X-<.
>
> A helper function called *_get_X() that does not return X as its
> return value or updating the location pointed by its *dst parameter,
> and instead only stores its finding in a global variable feels
> somewhat odd. It smells more like "find out", "probe", "check",
> etc.
>
I agree. Frankly, I think it should just return whether it should be
used or not (bool) and the hook name should be fixed. I got push back
when I did that for fsmonitor so I made this the same in an effort to
head off that same feedback.
>> diff --git a/dir.c b/dir.c
>> index 47c2fca8dc..3097b0e446 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -21,6 +21,7 @@
>> #include "ewah/ewok.h"
>> #include "fsmonitor.h"
>> #include "submodule-config.h"
>> +#include "virtualfilesystem.h"
>>
>> /*
>> * Tells read_directory_recursive how a file or directory should be treated.
>> @@ -1109,6 +1110,18 @@ int is_excluded_from_list(const char *pathname,
>> struct exclude_list *el, struct index_state *istate)
>> {
>> struct exclude *exclude;
>> +
>> + /*
>> + * The virtual file system data is used to prevent git from traversing
>> + * any part of the tree that is not in the virtual file system. Return
>> + * 1 to exclude the entry if it is not found in the virtual file system,
>> + * else fall through to the regular excludes logic as it may further exclude.
>> + */
>
> This comment will sit better immediately in front of the call to "is
> excluded from vfs?" helper function.
>
>> + if (*dtype == DT_UNKNOWN)
>> + *dtype = get_dtype(NULL, istate, pathname, pathlen);
>
> We try to defer paying cost to determine unknown *dtype as late as
> possible by having this call in last_exclude_matching_from_list(),
> and not here. If we are doing this, we probably should update the
> callpaths that call last_exclude_matching_from_list() to make the
> caller responsible for doing get_dtype() and drop the lazy finding
> of dtype from the callee. Alternatively, the new "is excluded from
> vfs" helper can learn to do the lazy get_dtype() just like the
> existing last_exclude_matching_from_list() does. I suspect the
> latter may be simpler.
In is_excluded_from_virtualfilesystem() dtype can't be lazy because it
is always needed (which is why I test and die if it isn't known). I
considered doing the test/call to get_dtype() within
is_excluded_from_virtualfilesystem() but didn't like making it dependent
on istate just so I could move the get_dtype() call in one level. It is
functionally identical so I can easily move it in if that is preferred.
>
>> + if (is_excluded_from_virtualfilesystem(pathname, pathlen, *dtype) > 0)
>> + return 1;
>> +
>> exclude = last_exclude_matching_from_list(pathname, pathlen, basename,
>> dtype, el, istate);
>> if (exclude)
>> @@ -1324,8 +1337,20 @@ struct exclude *last_exclude_matching(struct dir_struct *dir,
>> int is_excluded(struct dir_struct *dir, struct index_state *istate,
>> const char *pathname, int *dtype_p)
>> {
>> - struct exclude *exclude =
>> - last_exclude_matching(dir, istate, pathname, dtype_p);
>> + struct exclude *exclude;
>> +
>> + /*
>> + * The virtual file system data is used to prevent git from traversing
>> + * any part of the tree that is not in the virtual file system. Return
>> + * 1 to exclude the entry if it is not found in the virtual file system,
>> + * else fall through to the regular excludes logic as it may further exclude.
>> + */
>> + if (*dtype_p == DT_UNKNOWN)
>> + *dtype_p = get_dtype(NULL, istate, pathname, strlen(pathname));
>
> Exactly the same comment as above.
>
>> + if (is_excluded_from_virtualfilesystem(pathname, strlen(pathname), *dtype_p) > 0)
>> + return 1;
>> +
>> + exclude = last_exclude_matching(dir, istate, pathname, dtype_p);
>> if (exclude)
>> return exclude->flags & EXC_FLAG_NEGATIVE ? 0 : 1;
>> return 0;
>> @@ -1678,6 +1703,9 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
>> if (dtype != DT_DIR && has_path_in_index)
>> return path_none;
>>
>> + if (is_excluded_from_virtualfilesystem(path->buf, path->len, dtype) > 0)
>> + return path_excluded;
>> +
>> /*
>> * When we are looking at a directory P in the working tree,
>> * there are three cases:
>> @@ -2018,6 +2046,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>> /* add the path to the appropriate result list */
>> switch (state) {
>> case path_excluded:
>> + if (is_excluded_from_virtualfilesystem(path.buf, path.len, DT_DIR) > 0)
>> + break;
>> if (dir->flags & DIR_SHOW_IGNORED)
>> dir_add_name(dir, istate, path.buf, path.len);
>> else if ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
>
> I am kind-of surprised that the "damage" to dir.c need to support
> this is so isolated and small ;-)
>
When I'm expecting to carry a patch in a fork I try to structure it to
have as small of a footprint as possible (hence moving most of the code
into a separate file). Keeps the merge conflicts to a minimum. ;-)
>> ...
>> +if test "$1" != 1
>> +then
>> + echo "Unsupported core.virtualfilesystem hook version." >&2
>> + exit 1
>> +fi
>> +
>> +#find -type f -printf '%P\0'
>> +find -type d -printf '%P/\0'
>
> I am not reading (hence not commenting on) tests in this review
> message yet.
>
>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index 7570df481b..ee3cda2e94 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -18,6 +18,7 @@
>> #include "fsmonitor.h"
>> #include "object-store.h"
>> #include "fetch-object.h"
>> +#include "virtualfilesystem.h"
>>
>> /*
>> * Error messages expected by scripts out of plumbing commands such as
>> @@ -1363,6 +1364,14 @@ static int clear_ce_flags_1(struct index_state *istate,
>> continue;
>> }
>>
>> + /* if it's not in the virtual file system, exit early */
>> + if (core_virtualfilesystem) {
>> + if (is_included_in_virtualfilesystem(ce->name, ce->ce_namelen) > 0)
>> + ce->ce_flags &= ~clear_mask;
>> + cache++;
>> + continue;
>> + }
>
> Earlier we saw "is it excluded?" and now we have "is it included?"
> They have different function signature (i.e. "included?" does not
> need to know the type of the entry), and I am guessing that for the
> purpose of this particular patch that may be sufficient, but I have
> to wonder if in the longer term we'd be better off to keep the
> interface to these two functions similar. Also, I wonder if we need
> both---I see below that these "is included/excluded?" helpers are
> allowed to say "I dunno", so that may be the reason why we cannot
> simply say "included is the same as !excluded".
>
Most of the logic _is_ just inverted (which is why is_excluded...() can
call is_included...() to do most of the work) but there are differences
as well. The differences are mostly about how parent directories are
handled. I tried to write it to share as much of the logic as possible
but there may be something I've missed.
Your comments are all feedback on the code - how it was implemented,
style, etc. Any thoughts on whether this is something we could/should
merge into master (after any necessary cleanup)? Would anyone else find
this interesting/helpful?
next prev parent reply other threads:[~2018-10-31 20:12 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-30 19:16 [RFC v1] Add virtual file system settings and hook proc Ben Peart
2018-10-30 23:07 ` Junio C Hamano
2018-10-31 20:12 ` Ben Peart [this message]
2018-11-05 0:02 ` Junio C Hamano
2018-11-05 20:00 ` Ben Peart
2018-10-31 19:11 ` Duy Nguyen
2018-10-31 20:53 ` Ben Peart
2018-11-04 6:34 ` Duy Nguyen
2018-11-04 21:01 ` brian m. carlson
2018-11-05 15:22 ` Duy Nguyen
2018-11-05 20:18 ` Ben Peart
2018-11-05 20:27 ` Ben Peart
2018-11-05 11:40 ` Ævar Arnfjörð Bjarmason
2018-11-05 15:26 ` Duy Nguyen
2018-11-05 20:07 ` Ben Peart
2018-11-05 21:53 ` Johannes Schindelin
2018-11-27 19:50 ` [PATCH v1] teach git to support a virtual (partially populated) work directory Ben Peart
2018-11-28 13:31 ` SZEDER Gábor
2018-11-29 14:09 ` Ben Peart
2018-12-13 19:41 ` [PATCH v2] " Ben Peart
2019-01-28 19:00 ` Ben Peart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dbc2eb4f-842e-f49a-256f-3a140d801bb0@gmail.com \
--to=peartben@gmail.com \
--cc=benpeart@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).