git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Peart <peartben@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Ben Peart <benpeart@microsoft.com>
Subject: Re: [RFC v1] Add virtual file system settings and hook proc
Date: Wed, 31 Oct 2018 16:12:53 -0400	[thread overview]
Message-ID: <dbc2eb4f-842e-f49a-256f-3a140d801bb0@gmail.com> (raw)
In-Reply-To: <xmqqsh0nyqx9.fsf@gitster-ct.c.googlers.com>



On 10/30/2018 7:07 PM, Junio C Hamano wrote:
> Ben Peart <peartben@gmail.com> writes:
> 
>> diff --git a/config.c b/config.c
>> index 4051e38823..96e05ee0f1 100644
>> --- a/config.c
>> +++ b/config.c
>> ...
>> @@ -2307,6 +2311,37 @@ int git_config_get_index_threads(void)
>>   	return 0; /* auto */
>>   }
>>   
>> +int git_config_get_virtualfilesystem(void)
>> +{
>> +	if (git_config_get_pathname("core.virtualfilesystem", &core_virtualfilesystem))
>> +		core_virtualfilesystem = getenv("GIT_TEST_VIRTUALFILESYSTEM");
>> +
>> +	if (core_virtualfilesystem && !*core_virtualfilesystem)
>> +		core_virtualfilesystem = NULL;
>> +
>> +	if (core_virtualfilesystem) {
>> +		/*
>> +		 * Some git commands spawn helpers and redirect the index to a different
>> +		 * location.  These include "difftool -d" and the sequencer
>> +		 * (i.e. `git rebase -i`, `git cherry-pick` and `git revert`) and others.
>> +		 * In those instances we don't want to update their temporary index with
>> +		 * our virtualization data.
>> +		 */
>> +		char *default_index_file = xstrfmt("%s/%s", the_repository->gitdir, "index");
>> +		int should_run_hook = !strcmp(default_index_file, the_repository->index_file);
>> +
>> +		free(default_index_file);
>> +		if (should_run_hook) {
>> +			/* virtual file system relies on the sparse checkout logic so force it on */
>> +			core_apply_sparse_checkout = 1;
>> +			return 1;
>> +		}
>> +		core_virtualfilesystem = NULL;
> 
> It would be a small leak if this came from config_get_pathname(),
> but if it came from $GIT_TEST_VFS env, we cannot free it X-<.
> 
> A helper function called *_get_X() that does not return X as its
> return value or updating the location pointed by its *dst parameter,
> and instead only stores its finding in a global variable feels
> somewhat odd.  It smells more like "find out", "probe", "check",
> etc.
> 

I agree.  Frankly, I think it should just return whether it should be 
used or not (bool) and the hook name should be fixed.  I got push back 
when I did that for fsmonitor so I made this the same in an effort to 
head off that same feedback.

>> diff --git a/dir.c b/dir.c
>> index 47c2fca8dc..3097b0e446 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -21,6 +21,7 @@
>>   #include "ewah/ewok.h"
>>   #include "fsmonitor.h"
>>   #include "submodule-config.h"
>> +#include "virtualfilesystem.h"
>>   
>>   /*
>>    * Tells read_directory_recursive how a file or directory should be treated.
>> @@ -1109,6 +1110,18 @@ int is_excluded_from_list(const char *pathname,
>>   			  struct exclude_list *el, struct index_state *istate)
>>   {
>>   	struct exclude *exclude;
>> +
>> +	/*
>> +	 * The virtual file system data is used to prevent git from traversing
>> +	 * any part of the tree that is not in the virtual file system.  Return
>> +	 * 1 to exclude the entry if it is not found in the virtual file system,
>> +	 * else fall through to the regular excludes logic as it may further exclude.
>> +	 */
> 
> This comment will sit better immediately in front of the call to "is
> excluded from vfs?" helper function.
> 
>> +	if (*dtype == DT_UNKNOWN)
>> +		*dtype = get_dtype(NULL, istate, pathname, pathlen);
> 
> We try to defer paying cost to determine unknown *dtype as late as
> possible by having this call in last_exclude_matching_from_list(),
> and not here.  If we are doing this, we probably should update the
> callpaths that call last_exclude_matching_from_list() to make the
> caller responsible for doing get_dtype() and drop the lazy finding
> of dtype from the callee.  Alternatively, the new "is excluded from
> vfs" helper can learn to do the lazy get_dtype() just like the
> existing last_exclude_matching_from_list() does.  I suspect the
> latter may be simpler.

In is_excluded_from_virtualfilesystem() dtype can't be lazy because it 
is always needed (which is why I test and die if it isn't known).  I 
considered doing the test/call to get_dtype() within 
is_excluded_from_virtualfilesystem() but didn't like making it dependent 
on istate just so I could move the get_dtype() call in one level.  It is 
functionally identical so I can easily move it in if that is preferred.

> 
>> +	if (is_excluded_from_virtualfilesystem(pathname, pathlen, *dtype) > 0)
>> +		return 1;
>> +
>>   	exclude = last_exclude_matching_from_list(pathname, pathlen, basename,
>>   						  dtype, el, istate);
>>   	if (exclude)
>> @@ -1324,8 +1337,20 @@ struct exclude *last_exclude_matching(struct dir_struct *dir,
>>   int is_excluded(struct dir_struct *dir, struct index_state *istate,
>>   		const char *pathname, int *dtype_p)
>>   {
>> -	struct exclude *exclude =
>> -		last_exclude_matching(dir, istate, pathname, dtype_p);
>> +	struct exclude *exclude;
>> +
>> +	/*
>> +	 * The virtual file system data is used to prevent git from traversing
>> +	 * any part of the tree that is not in the virtual file system.  Return
>> +	 * 1 to exclude the entry if it is not found in the virtual file system,
>> +	 * else fall through to the regular excludes logic as it may further exclude.
>> +	 */
>> +	if (*dtype_p == DT_UNKNOWN)
>> +		*dtype_p = get_dtype(NULL, istate, pathname, strlen(pathname));
> 
> Exactly the same comment as above.
> 
>> +	if (is_excluded_from_virtualfilesystem(pathname, strlen(pathname), *dtype_p) > 0)
>> +		return 1;
>> +
>> +	exclude = last_exclude_matching(dir, istate, pathname, dtype_p);
>>   	if (exclude)
>>   		return exclude->flags & EXC_FLAG_NEGATIVE ? 0 : 1;
>>   	return 0;
>> @@ -1678,6 +1703,9 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
>>   	if (dtype != DT_DIR && has_path_in_index)
>>   		return path_none;
>>   
>> +	if (is_excluded_from_virtualfilesystem(path->buf, path->len, dtype) > 0)
>> +		return path_excluded;
>> +
>>   	/*
>>   	 * When we are looking at a directory P in the working tree,
>>   	 * there are three cases:
>> @@ -2018,6 +2046,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>>   		/* add the path to the appropriate result list */
>>   		switch (state) {
>>   		case path_excluded:
>> +			if (is_excluded_from_virtualfilesystem(path.buf, path.len, DT_DIR) > 0)
>> +				break;
>>   			if (dir->flags & DIR_SHOW_IGNORED)
>>   				dir_add_name(dir, istate, path.buf, path.len);
>>   			else if ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
> 
> I am kind-of surprised that the "damage" to dir.c need to support
> this is so isolated and small ;-)
> 

When I'm expecting to carry a patch in a fork I try to structure it to 
have as small of a footprint as possible (hence moving most of the code 
into a separate file).  Keeps the merge conflicts to a minimum. ;-)

>> ...
>> +if test "$1" != 1
>> +then
>> +	echo "Unsupported core.virtualfilesystem hook version." >&2
>> +	exit 1
>> +fi
>> +
>> +#find -type f -printf '%P\0'
>> +find -type d -printf '%P/\0'
> 
> I am not reading (hence not commenting on) tests in this review
> message yet.
> 
>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index 7570df481b..ee3cda2e94 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -18,6 +18,7 @@
>>   #include "fsmonitor.h"
>>   #include "object-store.h"
>>   #include "fetch-object.h"
>> +#include "virtualfilesystem.h"
>>   
>>   /*
>>    * Error messages expected by scripts out of plumbing commands such as
>> @@ -1363,6 +1364,14 @@ static int clear_ce_flags_1(struct index_state *istate,
>>   			continue;
>>   		}
>>   
>> +		/* if it's not in the virtual file system, exit early */
>> +		if (core_virtualfilesystem) {
>> +			if (is_included_in_virtualfilesystem(ce->name, ce->ce_namelen) > 0)
>> +				ce->ce_flags &= ~clear_mask;
>> +			cache++;
>> +			continue;
>> +		}
> 
> Earlier we saw "is it excluded?" and now we have "is it included?"
> They have different function signature (i.e. "included?" does not
> need to know the type of the entry), and I am guessing that for the
> purpose of this particular patch that may be sufficient, but I have
> to wonder if in the longer term we'd be better off to keep the
> interface to these two functions similar.  Also, I wonder if we need
> both---I see below that these "is included/excluded?" helpers are
> allowed to say "I dunno", so that may be the reason why we cannot
> simply say "included is the same as !excluded".
> 

Most of the logic _is_ just inverted (which is why is_excluded...() can 
call is_included...() to do most of the work) but there are differences 
as well.  The differences are mostly about how parent directories are 
handled.  I tried to write it to share as much of the logic as possible 
but there may be something I've missed.


Your comments are all feedback on the code - how it was implemented, 
style, etc.  Any thoughts on whether this is something we could/should 
merge into master (after any necessary cleanup)?  Would anyone else find 
this interesting/helpful?

  reply	other threads:[~2018-10-31 20:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-30 19:16 [RFC v1] Add virtual file system settings and hook proc Ben Peart
2018-10-30 23:07 ` Junio C Hamano
2018-10-31 20:12   ` Ben Peart [this message]
2018-11-05  0:02     ` Junio C Hamano
2018-11-05 20:00       ` Ben Peart
2018-10-31 19:11 ` Duy Nguyen
2018-10-31 20:53   ` Ben Peart
2018-11-04  6:34     ` Duy Nguyen
2018-11-04 21:01       ` brian m. carlson
2018-11-05 15:22         ` Duy Nguyen
2018-11-05 20:18           ` Ben Peart
2018-11-05 20:27         ` Ben Peart
2018-11-05 11:40       ` Ævar Arnfjörð Bjarmason
2018-11-05 15:26         ` Duy Nguyen
2018-11-05 20:07           ` Ben Peart
2018-11-05 21:53         ` Johannes Schindelin
2018-11-27 19:50 ` [PATCH v1] teach git to support a virtual (partially populated) work directory Ben Peart
2018-11-28 13:31   ` SZEDER Gábor
2018-11-29 14:09     ` Ben Peart
2018-12-13 19:41 ` [PATCH v2] " Ben Peart
2019-01-28 19:00   ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dbc2eb4f-842e-f49a-256f-3a140d801bb0@gmail.com \
    --to=peartben@gmail.com \
    --cc=benpeart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).