From: Ben Peart <peartben@gmail.com>
To: git@vger.kernel.org
Cc: benpeart@microsoft.com, pclouds@gmail.com,
sandals@crustytoothpaste.net, avarab@gmail.com,
Johannes.Schindelin@gmx.de, szeder.dev@gmail.com
Subject: Re: [PATCH v2] teach git to support a virtual (partially populated) work directory
Date: Mon, 28 Jan 2019 14:00:09 -0500 [thread overview]
Message-ID: <09c16383-9778-29a6-80f4-3cfdbc5d180c@gmail.com> (raw)
In-Reply-To: <20181213194107.31572-1-peartben@gmail.com>
Ping. Any thoughts, comments, feedback, suggestions?
On 12/13/2018 2:41 PM, Ben Peart wrote:
> From: Ben Peart <benpeart@microsoft.com>
>
> To make git perform well on the very largest repos, we must make git
> operations O(modified) instead of O(size of repo). This takes advantage of
> the fact that the number of files a developer has modified (especially
> in very large repos) is typically a tiny fraction of the overall repo size.
>
> We accomplished this by utilizing the existing internal logic for the skip
> worktree bit and excludes to tell git to ignore all files and folders other
> than those that have been modified. This logic is driven by an external
> process that monitors writes to the repo and communicates the list of files
> and folders with changes to git via the virtual work directory hook in this
> patch.
>
> The external process maintains a list of files and folders that have been
> modified. When git runs, it requests the list of files and folders that
> have been modified via the virtual work directory hook. Git then sets/clears
> the skip-worktree bit on the cache entries and builds a hashmap of the
> modified files/folders that is used by the excludes logic to avoid scanning
> the entire repo looking for changes and untracked files.
>
> With this system, we have been able to make local git command performance on
> extremely large repos (millions of files, 1/2 million folders) entirely
> manageable (30 second checkout, 3.5 seconds status, 4 second add, 7 second
> commit, etc).
>
> On index load, clear/set the skip worktree bits based on the virtual
> work directory data. Use virtual work directory data to update skip-worktree
> bit in unpack-trees. Use virtual work directory data to exclude files and
> folders not explicitly requested.
>
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
>
> Notes:
> Base Ref: v2.20.0
> Web-Diff: https://github.com/benpeart/git/commit/acc00a41af
> Checkout: git fetch https://github.com/benpeart/git virtual-workdir-v2 && git checkout acc00a41af
>
> ### Patches
>
> Documentation/config/core.txt | 9 +
> Documentation/githooks.txt | 23 ++
> Makefile | 1 +
> cache.h | 1 +
> config.c | 32 ++-
> config.h | 1 +
> dir.c | 26 ++-
> environment.c | 1 +
> read-cache.c | 2 +
> t/t1092-virtualworkdir.sh | 390 ++++++++++++++++++++++++++++++++++
> unpack-trees.c | 23 +-
> virtualworkdir.c | 314 +++++++++++++++++++++++++++
> virtualworkdir.h | 25 +++
> 13 files changed, 840 insertions(+), 8 deletions(-)
> create mode 100755 t/t1092-virtualworkdir.sh
> create mode 100644 virtualworkdir.c
> create mode 100644 virtualworkdir.h
>
> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index d0e6635fe0..49b7699a4e 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -68,6 +68,15 @@ core.fsmonitor::
> avoiding unnecessary processing of files that have not changed.
> See the "fsmonitor-watchman" section of linkgit:githooks[5].
>
> +core.virtualWorkDir::
> + Please regard this as an experimental feature.
> + If set to true, utilize the virtual-work-dir hook to identify all
> + files and directories that are present in the working directory.
> + Git will only track and update files listed in the virtual work
> + directory. Using the virtual work directory will supersede the
> + sparse-checkout settings which will be ignored.
> + See the "virtual-work-dir" section of linkgit:githooks[6].
> +
> core.trustctime::
> If false, the ctime differences between the index and the
> working tree are ignored; useful when the inode change time
> diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
> index 959044347e..9888d504b4 100644
> --- a/Documentation/githooks.txt
> +++ b/Documentation/githooks.txt
> @@ -485,6 +485,29 @@ The exit status determines whether git will use the data from the
> hook to limit its search. On error, it will fall back to verifying
> all files and folders.
>
> +virtual-work-dir
> +~~~~~~~~~~~~~~~~
> +
> +Please regard this as an experimental feature.
> +
> +The "Virtual Work Directory" hook allows populating the working directory
> +sparsely. The virtual work directory data is typically automatically
> +generated by an external process. Git will limit what files it checks for
> +changes as well as which directories are checked for untracked files based
> +on the path names given. Git will also only update those files listed in the
> +virtual work directory.
> +
> +The hook is invoked when the configuration option core.virtualWorkDir is
> +set to true. The hook takes one argument, a version (currently 1).
> +
> +The hook should output to stdout the list of all files in the working
> +directory that git should track. The paths are relative to the root
> +of the working directory and are separated by a single NUL. Full paths
> +('dir1/a.txt') as well as directories are supported (ie 'dir1/').
> +
> +The exit status determines whether git will use the data from the
> +hook. On error, git will abort the command with an error message.
> +
> p4-pre-submit
> ~~~~~~~~~~~~~
>
> diff --git a/Makefile b/Makefile
> index 1a44c811aa..061f1ab954 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1012,6 +1012,7 @@ LIB_OBJS += utf8.o
> LIB_OBJS += varint.o
> LIB_OBJS += version.o
> LIB_OBJS += versioncmp.o
> +LIB_OBJS += virtualworkdir.o
> LIB_OBJS += walker.o
> LIB_OBJS += wildmatch.o
> LIB_OBJS += worktree.o
> diff --git a/cache.h b/cache.h
> index ca36b44ee0..39650e6efd 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -886,6 +886,7 @@ extern char *git_replace_ref_base;
> extern int fsync_object_files;
> extern int core_preload_index;
> extern int core_apply_sparse_checkout;
> +extern int core_virtualworkdir;
> extern int precomposed_unicode;
> extern int protect_hfs;
> extern int protect_ntfs;
> diff --git a/config.c b/config.c
> index ff521eb27a..fc0d51aa69 100644
> --- a/config.c
> +++ b/config.c
> @@ -1325,7 +1325,11 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
> }
>
> if (!strcmp(var, "core.sparsecheckout")) {
> - core_apply_sparse_checkout = git_config_bool(var, value);
> + /* virtual working directory relies on the sparse checkout logic so force it on */
> + if (core_virtualworkdir)
> + core_apply_sparse_checkout = 1;
> + else
> + core_apply_sparse_checkout = git_config_bool(var, value);
> return 0;
> }
>
> @@ -2315,6 +2319,32 @@ int git_config_get_index_threads(int *dest)
> return 1;
> }
>
> +int git_config_get_virtualworkdir(void)
> +{
> + git_config_get_bool("core.virtualworkdir", &core_virtualworkdir);
> + if (core_virtualworkdir) {
> + /*
> + * Some git commands spawn helpers and redirect the index to a different
> + * location. These include "difftool -d" and the sequencer
> + * (i.e. `git rebase -i`, `git cherry-pick` and `git revert`) and others.
> + * In those instances we don't want to update their temporary index with
> + * our virtualization data.
> + */
> + char *default_index_file = xstrfmt("%s/%s", the_repository->gitdir, "index");
> + int should_run_hook = !strcmp(default_index_file, the_repository->index_file);
> +
> + free(default_index_file);
> + if (should_run_hook) {
> + /* virtual working directory relies on the sparse checkout logic so force it on */
> + core_apply_sparse_checkout = 1;
> + return core_virtualworkdir;
> + }
> + core_virtualworkdir = 0;
> + }
> +
> + return core_virtualworkdir;
> +}
> +
> NORETURN
> void git_die_config_linenr(const char *key, const char *filename, int linenr)
> {
> diff --git a/config.h b/config.h
> index ee5d3fa7b4..e89590603c 100644
> --- a/config.h
> +++ b/config.h
> @@ -251,6 +251,7 @@ extern int git_config_get_untracked_cache(void);
> extern int git_config_get_split_index(void);
> extern int git_config_get_max_percent_split_change(void);
> extern int git_config_get_fsmonitor(void);
> +extern int git_config_get_virtualworkdir(void);
>
> /* This dies if the configured or default date is in the future */
> extern int git_config_get_expiry(const char *key, const char **output);
> diff --git a/dir.c b/dir.c
> index ab6477d777..987a3eb17f 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -21,6 +21,7 @@
> #include "ewah/ewok.h"
> #include "fsmonitor.h"
> #include "submodule-config.h"
> +#include "virtualworkdir.h"
>
> /*
> * Tells read_directory_recursive how a file or directory should be treated.
> @@ -1116,6 +1117,14 @@ int is_excluded_from_list(const char *pathname,
> struct exclude_list *el, struct index_state *istate)
> {
> struct exclude *exclude;
> +
> + if (core_virtualworkdir) {
> + if (*dtype == DT_UNKNOWN)
> + *dtype = get_dtype(NULL, istate, pathname, pathlen);
> + if (is_excluded_from_virtualworkdir(pathname, pathlen, *dtype) > 0)
> + return 1;
> + }
> +
> exclude = last_exclude_matching_from_list(pathname, pathlen, basename,
> dtype, el, istate);
> if (exclude)
> @@ -1331,8 +1340,16 @@ struct exclude *last_exclude_matching(struct dir_struct *dir,
> int is_excluded(struct dir_struct *dir, struct index_state *istate,
> const char *pathname, int *dtype_p)
> {
> - struct exclude *exclude =
> - last_exclude_matching(dir, istate, pathname, dtype_p);
> + struct exclude *exclude;
> +
> + if (core_virtualworkdir) {
> + if (*dtype_p == DT_UNKNOWN)
> + *dtype_p = get_dtype(NULL, istate, pathname, strlen(pathname));
> + if (is_excluded_from_virtualworkdir(pathname, strlen(pathname), *dtype_p) > 0)
> + return 1;
> + }
> +
> + exclude = last_exclude_matching(dir, istate, pathname, dtype_p);
> if (exclude)
> return exclude->flags & EXC_FLAG_NEGATIVE ? 0 : 1;
> return 0;
> @@ -1685,6 +1702,9 @@ static enum path_treatment treat_one_path(struct dir_struct *dir,
> if (dtype != DT_DIR && has_path_in_index)
> return path_none;
>
> + if (is_excluded_from_virtualworkdir(path->buf, path->len, dtype) > 0)
> + return path_excluded;
> +
> /*
> * When we are looking at a directory P in the working tree,
> * there are three cases:
> @@ -2025,6 +2045,8 @@ static enum path_treatment read_directory_recursive(struct dir_struct *dir,
> /* add the path to the appropriate result list */
> switch (state) {
> case path_excluded:
> + if (is_excluded_from_virtualworkdir(path.buf, path.len, DT_DIR) > 0)
> + break;
> if (dir->flags & DIR_SHOW_IGNORED)
> dir_add_name(dir, istate, path.buf, path.len);
> else if ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
> diff --git a/environment.c b/environment.c
> index 3465597707..bc0cef4506 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -69,6 +69,7 @@ enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
> char *notes_ref_name;
> int grafts_replace_parents = 1;
> int core_apply_sparse_checkout;
> +int core_virtualworkdir;
> int merge_log_config = -1;
> int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
> unsigned long pack_size_limit_cfg;
> diff --git a/read-cache.c b/read-cache.c
> index bd45dc3e24..a2c8027977 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -25,6 +25,7 @@
> #include "fsmonitor.h"
> #include "thread-utils.h"
> #include "progress.h"
> +#include "virtualworkdir.h"
>
> /* Mask for the name length in ce_flags in the on-disk index */
>
> @@ -1894,6 +1895,7 @@ static void post_read_index_from(struct index_state *istate)
> tweak_untracked_cache(istate);
> tweak_split_index(istate);
> tweak_fsmonitor(istate);
> + apply_virtualworkdir(istate);
> }
>
> static size_t estimate_cache_size_from_compressed(unsigned int entries)
> diff --git a/t/t1092-virtualworkdir.sh b/t/t1092-virtualworkdir.sh
> new file mode 100755
> index 0000000000..752049fbe3
> --- /dev/null
> +++ b/t/t1092-virtualworkdir.sh
> @@ -0,0 +1,390 @@
> +#!/bin/sh
> +
> +test_description='virtual work directory tests'
> +
> +. ./test-lib.sh
> +
> +reset_repo () {
> + rm .git/index &&
> + git -c core.virtualworkdir=false reset --hard HEAD &&
> + git -c core.virtualworkdir=false clean -fd &&
> + >untracked.txt &&
> + >dir1/untracked.txt &&
> + >dir2/untracked.txt
> +}
> +
> +test_expect_success 'setup' '
> + mkdir -p .git/hooks/ &&
> + cat >.gitignore <<-\EOF &&
> + .gitignore
> + expect*
> + actual*
> + EOF
> + >file1.txt &&
> + >file2.txt &&
> + mkdir -p dir1 &&
> + >dir1/file1.txt &&
> + >dir1/file2.txt &&
> + mkdir -p dir2 &&
> + >dir2/file1.txt &&
> + >dir2/file2.txt &&
> + git add . &&
> + git commit -m "initial" &&
> + git config --local core.virtualworkdir true
> +'
> +
> +test_expect_success 'test hook parameters and version' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + if test "$#" -ne 1
> + then
> + echo "$0: Exactly 1 argument expected" >&2
> + exit 2
> + fi
> +
> + if test "$1" != 1
> + then
> + echo "$0: Unsupported hook version." >&2
> + exit 1
> + fi
> + EOF
> + git status &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + exit 3
> + EOF
> + test_must_fail git status
> +'
> +
> +test_expect_success 'verify status is clean' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir2/file1.txt\0"
> + EOF
> + rm -f .git/index &&
> + git checkout -f &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir2/file1.txt\0"
> + printf "dir1/file1.txt\0"
> + printf "dir1/file2.txt\0"
> + EOF
> + git status >actual &&
> + cat >expected <<-\EOF &&
> + On branch master
> + nothing to commit, working tree clean
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'verify skip-worktree bit is set for absolute path' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir1/file1.txt\0"
> + EOF
> + git ls-files -v >actual &&
> + cat >expected <<-\EOF &&
> + H dir1/file1.txt
> + S dir1/file2.txt
> + S dir2/file1.txt
> + S dir2/file2.txt
> + S file1.txt
> + S file2.txt
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'verify skip-worktree bit is cleared for absolute path' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir1/file2.txt\0"
> + EOF
> + git ls-files -v >actual &&
> + cat >expected <<-\EOF &&
> + S dir1/file1.txt
> + H dir1/file2.txt
> + S dir2/file1.txt
> + S dir2/file2.txt
> + S file1.txt
> + S file2.txt
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'verify folder wild cards' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir1/\0"
> + EOF
> + git ls-files -v >actual &&
> + cat >expected <<-\EOF &&
> + H dir1/file1.txt
> + H dir1/file2.txt
> + S dir2/file1.txt
> + S dir2/file2.txt
> + S file1.txt
> + S file2.txt
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'verify folders not included are ignored' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir1/file1.txt\0"
> + printf "dir1/file2.txt\0"
> + EOF
> + mkdir -p dir1/dir2 &&
> + >dir1/a &&
> + >dir1/b &&
> + >dir1/dir2/a &&
> + >dir1/dir2/b &&
> + git add . &&
> + git ls-files -v >actual &&
> + cat >expected <<-\EOF &&
> + H dir1/file1.txt
> + H dir1/file2.txt
> + S dir2/file1.txt
> + S dir2/file2.txt
> + S file1.txt
> + S file2.txt
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'verify including one file doesnt include the rest' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir1/file1.txt\0"
> + printf "dir1/file2.txt\0"
> + printf "dir1/dir2/a\0"
> + EOF
> + mkdir -p dir1/dir2 &&
> + >dir1/a &&
> + >dir1/b &&
> + >dir1/dir2/a &&
> + >dir1/dir2/b &&
> + git add . &&
> + git ls-files -v >actual &&
> + cat >expected <<-\EOF &&
> + H dir1/dir2/a
> + H dir1/file1.txt
> + H dir1/file2.txt
> + S dir2/file1.txt
> + S dir2/file2.txt
> + S file1.txt
> + S file2.txt
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'verify files not listed are ignored by git clean -f -x' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "untracked.txt\0"
> + printf "dir1/\0"
> + EOF
> + mkdir -p dir3 &&
> + >dir3/untracked.txt &&
> + git clean -f -x &&
> + test_path_is_file file1.txt &&
> + test_path_is_file file2.txt &&
> + test_path_is_missing untracked.txt &&
> + test_path_is_dir dir1 &&
> + test_path_is_file dir1/file1.txt &&
> + test_path_is_file dir1/file2.txt &&
> + test_path_is_missing dir1/untracked.txt &&
> + test_path_is_file dir2/file1.txt &&
> + test_path_is_file dir2/file2.txt &&
> + test_path_is_file dir2/untracked.txt &&
> + test_path_is_dir dir3 &&
> + test_path_is_file dir3/untracked.txt
> +'
> +
> +test_expect_success 'verify files not listed are ignored by git clean -f -d -x' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "untracked.txt\0"
> + printf "dir1/\0"
> + printf "dir3/\0"
> + EOF
> + mkdir -p dir3 &&
> + >dir3/untracked.txt &&
> + git clean -f -d -x &&
> + test_path_is_file file1.txt &&
> + test_path_is_file file2.txt &&
> + test_path_is_missing untracked.txt &&
> + test_path_is_dir dir1 &&
> + test_path_is_file dir1/file1.txt &&
> + test_path_is_file dir1/file2.txt &&
> + test_path_is_missing dir1/untracked.txt &&
> + test_path_is_file dir2/file1.txt &&
> + test_path_is_file dir2/file2.txt &&
> + test_path_is_file dir2/untracked.txt &&
> + test ! -d dir3 &&
> + test_path_is_missing dir3/untracked.txt
> +'
> +
> +test_expect_success 'verify folder entries include all files' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir1/\0"
> + EOF
> + mkdir -p dir1/dir2 &&
> + >dir1/a &&
> + >dir1/b &&
> + >dir1/dir2/a &&
> + >dir1/dir2/b &&
> + git status -su >actual &&
> + cat >expected <<-\EOF &&
> + ?? dir1/a
> + ?? dir1/b
> + ?? dir1/dir2/a
> + ?? dir1/dir2/b
> + ?? dir1/untracked.txt
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'verify case insensitivity of virtual work directory entries' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir1/a\0"
> + printf "Dir1/Dir2/a\0"
> + printf "DIR2/\0"
> + EOF
> + mkdir -p dir1/dir2 &&
> + >dir1/a &&
> + >dir1/b &&
> + >dir1/dir2/a &&
> + >dir1/dir2/b &&
> + git -c core.ignorecase=false status -su >actual &&
> + cat >expected <<-\EOF &&
> + ?? dir1/a
> + EOF
> + test_cmp expected actual &&
> + git -c core.ignorecase=true status -su >actual &&
> + cat >expected <<-\EOF &&
> + ?? dir1/a
> + ?? dir1/dir2/a
> + ?? dir2/untracked.txt
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'on file created' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir1/file3.txt\0"
> + EOF
> + >dir1/file3.txt &&
> + git add . &&
> + git ls-files -v >actual &&
> + cat >expected <<-\EOF &&
> + S dir1/file1.txt
> + S dir1/file2.txt
> + H dir1/file3.txt
> + S dir2/file1.txt
> + S dir2/file2.txt
> + S file1.txt
> + S file2.txt
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'on file renamed' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir1/file1.txt\0"
> + printf "dir1/file3.txt\0"
> + EOF
> + mv dir1/file1.txt dir1/file3.txt &&
> + git status -su >actual &&
> + cat >expected <<-\EOF &&
> + D dir1/file1.txt
> + ?? dir1/file3.txt
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'on file deleted' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir1/file1.txt\0"
> + EOF
> + rm dir1/file1.txt &&
> + git status -su >actual &&
> + cat >expected <<-\EOF &&
> + D dir1/file1.txt
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'on file overwritten' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir1/file1.txt\0"
> + EOF
> + echo "overwritten" >dir1/file1.txt &&
> + git status -su >actual &&
> + cat >expected <<-\EOF &&
> + M dir1/file1.txt
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'on folder created' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir1/dir1/\0"
> + EOF
> + mkdir -p dir1/dir1 &&
> + git status -su >actual &&
> + cat >expected <<-\EOF &&
> + EOF
> + test_cmp expected actual &&
> + git clean -fd &&
> + test ! -d "/dir1/dir1"
> +'
> +
> +test_expect_success 'on folder renamed' '
> + reset_repo &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir3/\0"
> + printf "dir1/file1.txt\0"
> + printf "dir1/file2.txt\0"
> + printf "dir3/file1.txt\0"
> + printf "dir3/file2.txt\0"
> + EOF
> + mv dir1 dir3 &&
> + git status -su >actual &&
> + cat >expected <<-\EOF &&
> + D dir1/file1.txt
> + D dir1/file2.txt
> + ?? dir3/file1.txt
> + ?? dir3/file2.txt
> + ?? dir3/untracked.txt
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'folder with same prefix as file' '
> + reset_repo &&
> + >dir1.sln &&
> + write_script .git/hooks/virtual-work-dir <<-\EOF &&
> + printf "dir1/\0"
> + printf "dir1.sln\0"
> + EOF
> + git add dir1.sln &&
> + git ls-files -v >actual &&
> + cat >expected <<-\EOF &&
> + H dir1.sln
> + H dir1/file1.txt
> + H dir1/file2.txt
> + S dir2/file1.txt
> + S dir2/file2.txt
> + S file1.txt
> + S file2.txt
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_done
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 7570df481b..c6c20c9b61 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -18,6 +18,7 @@
> #include "fsmonitor.h"
> #include "object-store.h"
> #include "fetch-object.h"
> +#include "virtualworkdir.h"
>
> /*
> * Error messages expected by scripts out of plumbing commands such as
> @@ -1363,6 +1364,14 @@ static int clear_ce_flags_1(struct index_state *istate,
> continue;
> }
>
> + /* if it's not in the virtual working directory, exit early */
> + if (core_virtualworkdir) {
> + if (is_included_in_virtualworkdir(ce->name, ce->ce_namelen) > 0)
> + ce->ce_flags &= ~clear_mask;
> + cache++;
> + continue;
> + }
> +
> if (prefix->len && strncmp(ce->name, prefix->buf, prefix->len))
> break;
>
> @@ -1481,12 +1490,16 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
> if (!core_apply_sparse_checkout || !o->update)
> o->skip_sparse_checkout = 1;
> if (!o->skip_sparse_checkout) {
> - char *sparse = git_pathdup("info/sparse-checkout");
> - if (add_excludes_from_file_to_list(sparse, "", 0, &el, NULL) < 0)
> - o->skip_sparse_checkout = 1;
> - else
> + if (core_virtualworkdir) {
> o->el = ⪙
> - free(sparse);
> + } else {
> + char *sparse = git_pathdup("info/sparse-checkout");
> + if (add_excludes_from_file_to_list(sparse, "", 0, &el, NULL) < 0)
> + o->skip_sparse_checkout = 1;
> + else
> + o->el = ⪙
> + free(sparse);
> + }
> }
>
> memset(&o->result, 0, sizeof(o->result));
> diff --git a/virtualworkdir.c b/virtualworkdir.c
> new file mode 100644
> index 0000000000..f2c8025bf5
> --- /dev/null
> +++ b/virtualworkdir.c
> @@ -0,0 +1,314 @@
> +#include "cache.h"
> +#include "config.h"
> +#include "dir.h"
> +#include "hashmap.h"
> +#include "run-command.h"
> +#include "virtualworkdir.h"
> +
> +#define HOOK_INTERFACE_VERSION (1)
> +
> +static struct strbuf virtual_workdir_data = STRBUF_INIT;
> +static struct hashmap virtual_workdir_hashmap;
> +static struct hashmap parent_directory_hashmap;
> +
> +struct virtualworkdir {
> + struct hashmap_entry ent; /* must be the first member! */
> + const char *pattern;
> + int patternlen;
> +};
> +
> +static unsigned int(*vwdhash)(const void *buf, size_t len);
> +static int(*vwdcmp)(const char *a, const char *b, size_t len);
> +
> +static int vwd_hashmap_cmp(const void *unused_cmp_data,
> + const void *a, const void *b, const void *key)
> +{
> + const struct virtualworkdir *vwd1 = a;
> + const struct virtualworkdir *vwd2 = b;
> +
> + return vwdcmp(vwd1->pattern, vwd2->pattern, vwd1->patternlen);
> +}
> +
> +static void get_virtual_workdir_data(struct strbuf *vwd_data)
> +{
> + struct child_process cp = CHILD_PROCESS_INIT;
> + const char *p;
> + int err;
> +
> + strbuf_init(vwd_data, 0);
> +
> + p = find_hook("virtual-work-dir");
> + if (!p)
> + die("unable to find virtual-work-dir hook");
> +
> + argv_array_push(&cp.args, p);
> + argv_array_pushf(&cp.args, "%d", HOOK_INTERFACE_VERSION);
> + cp.use_shell = 1;
> + cp.dir = get_git_work_tree();
> +
> + err = capture_command(&cp, vwd_data, 1024);
> + if (err)
> + die("unable to load virtual working directory");
> +}
> +
> +static int check_includes_hashmap(struct hashmap *map, const char *pattern, int patternlen)
> +{
> + struct strbuf sb = STRBUF_INIT;
> + struct virtualworkdir vwd;
> + char *slash;
> +
> + /* Check straight mapping */
> + strbuf_reset(&sb);
> + strbuf_add(&sb, pattern, patternlen);
> + vwd.pattern = sb.buf;
> + vwd.patternlen = sb.len;
> + hashmap_entry_init(&vwd, vwdhash(vwd.pattern, vwd.patternlen));
> + if (hashmap_get(map, &vwd, NULL)) {
> + strbuf_release(&sb);
> + return 1;
> + }
> +
> + /*
> + * Check to see if it matches a directory or any path
> + * underneath it. In other words, 'a/b/foo.txt' will match
> + * '/', 'a/', and 'a/b/'.
> + */
> + slash = strchr(sb.buf, '/');
> + while (slash) {
> + vwd.pattern = sb.buf;
> + vwd.patternlen = slash - sb.buf + 1;
> + hashmap_entry_init(&vwd, vwdhash(vwd.pattern, vwd.patternlen));
> + if (hashmap_get(map, &vwd, NULL)) {
> + strbuf_release(&sb);
> + return 1;
> + }
> + slash = strchr(slash + 1, '/');
> + }
> +
> + strbuf_release(&sb);
> + return 0;
> +}
> +
> +static void includes_hashmap_add(struct hashmap *map, const char *pattern, const int patternlen)
> +{
> + struct virtualworkdir *vwd;
> +
> + vwd = xmalloc(sizeof(struct virtualworkdir));
> + vwd->pattern = pattern;
> + vwd->patternlen = patternlen;
> + hashmap_entry_init(vwd, vwdhash(vwd->pattern, vwd->patternlen));
> + hashmap_add(map, vwd);
> +}
> +
> +static void initialize_includes_hashmap(struct hashmap *map, struct strbuf *vwd_data)
> +{
> + char *buf, *entry;
> + size_t len;
> + int i;
> +
> + /*
> + * Build a hashmap of the virtual working directory data we can use to look
> + * for cache entry matches quickly
> + */
> + vwdhash = ignore_case ? memihash : memhash;
> + vwdcmp = ignore_case ? strncasecmp : strncmp;
> + hashmap_init(map, vwd_hashmap_cmp, NULL, 0);
> +
> + entry = buf = vwd_data->buf;
> + len = vwd_data->len;
> + for (i = 0; i < len; i++) {
> + if (buf[i] == '\0') {
> + includes_hashmap_add(map, entry, buf + i - entry);
> + entry = buf + i + 1;
> + }
> + }
> +}
> +
> +/*
> + * Return 1 if the requested item is found in the virtual working directory,
> + * 0 for not found and -1 for undecided.
> + */
> +int is_included_in_virtualworkdir(const char *pathname, int pathlen)
> +{
> + if (!core_virtualworkdir)
> + return -1;
> +
> + if (!virtual_workdir_hashmap.tablesize && virtual_workdir_data.len)
> + initialize_includes_hashmap(&virtual_workdir_hashmap, &virtual_workdir_data);
> + if (!virtual_workdir_hashmap.tablesize)
> + return -1;
> +
> + return check_includes_hashmap(&virtual_workdir_hashmap, pathname, pathlen);
> +}
> +
> +static void parent_directory_hashmap_add(struct hashmap *map, const char *pattern, const int patternlen)
> +{
> + char *slash;
> + struct virtualworkdir *vwd;
> +
> + /*
> + * Add any directories leading up to the file as the excludes logic
> + * needs to match directories leading up to the files as well. Detect
> + * and prevent unnecessary duplicate entries which will be common.
> + */
> + if (patternlen > 1) {
> + slash = strchr(pattern + 1, '/');
> + while (slash) {
> + vwd = xmalloc(sizeof(struct virtualworkdir));
> + vwd->pattern = pattern;
> + vwd->patternlen = slash - pattern + 1;
> + hashmap_entry_init(vwd, vwdhash(vwd->pattern, vwd->patternlen));
> + if (hashmap_get(map, vwd, NULL))
> + free(vwd);
> + else
> + hashmap_add(map, vwd);
> + slash = strchr(slash + 1, '/');
> + }
> + }
> +}
> +
> +static void initialize_parent_directory_hashmap(struct hashmap *map, struct strbuf *vwd_data)
> +{
> + char *buf, *entry;
> + size_t len;
> + int i;
> +
> + /*
> + * Build a hashmap of the parent directories contained in the virtual
> + * file system data we can use to look for matches quickly
> + */
> + vwdhash = ignore_case ? memihash : memhash;
> + vwdcmp = ignore_case ? strncasecmp : strncmp;
> + hashmap_init(map, vwd_hashmap_cmp, NULL, 0);
> +
> + entry = buf = vwd_data->buf;
> + len = vwd_data->len;
> + for (i = 0; i < len; i++) {
> + if (buf[i] == '\0') {
> + parent_directory_hashmap_add(map, entry, buf + i - entry);
> + entry = buf + i + 1;
> + }
> + }
> +}
> +
> +static int check_directory_hashmap(struct hashmap *map, const char *pathname, int pathlen)
> +{
> + struct strbuf sb = STRBUF_INIT;
> + struct virtualworkdir vwd;
> +
> + /* Check for directory */
> + strbuf_reset(&sb);
> + strbuf_add(&sb, pathname, pathlen);
> + strbuf_addch(&sb, '/');
> + vwd.pattern = sb.buf;
> + vwd.patternlen = sb.len;
> + hashmap_entry_init(&vwd, vwdhash(vwd.pattern, vwd.patternlen));
> + if (hashmap_get(map, &vwd, NULL)) {
> + strbuf_release(&sb);
> + return 0;
> + }
> +
> + strbuf_release(&sb);
> + return 1;
> +}
> +
> +/*
> + * Return 1 for exclude, 0 for include and -1 for undecided.
> + */
> +int is_excluded_from_virtualworkdir(const char *pathname, int pathlen, int dtype)
> +{
> + if (!core_virtualworkdir)
> + return -1;
> +
> + if (dtype != DT_REG && dtype != DT_DIR && dtype != DT_LNK)
> + die(_("is_excluded_from_virtualworkdir passed unhandled dtype"));
> +
> + if (dtype == DT_REG || dtype == DT_LNK) {
> + int ret = is_included_in_virtualworkdir(pathname, pathlen);
> + if (ret > 0)
> + return 0;
> + if (ret == 0)
> + return 1;
> + return ret;
> + }
> +
> + if (dtype == DT_DIR) {
> + int ret = is_included_in_virtualworkdir(pathname, pathlen);
> + if (ret > 0)
> + return 0;
> +
> + if (!parent_directory_hashmap.tablesize && virtual_workdir_data.len)
> + initialize_parent_directory_hashmap(&parent_directory_hashmap, &virtual_workdir_data);
> + if (!parent_directory_hashmap.tablesize)
> + return -1;
> +
> + return check_directory_hashmap(&parent_directory_hashmap, pathname, pathlen);
> + }
> +
> + return -1;
> +}
> +
> +/*
> + * Update the CE_SKIP_WORKTREE bits based on the virtual working directory.
> + */
> +void apply_virtualworkdir(struct index_state *istate)
> +{
> + char *buf, *entry;
> + int i;
> +
> + if (!git_config_get_virtualworkdir())
> + return;
> +
> + if (!virtual_workdir_data.len)
> + get_virtual_workdir_data(&virtual_workdir_data);
> +
> + /* set CE_SKIP_WORKTREE bit on all entries */
> + for (i = 0; i < istate->cache_nr; i++)
> + istate->cache[i]->ce_flags |= CE_SKIP_WORKTREE;
> +
> + /* clear CE_SKIP_WORKTREE bit for everything in the virtual working directory */
> + entry = buf = virtual_workdir_data.buf;
> + for (i = 0; i < virtual_workdir_data.len; i++) {
> + if (buf[i] == '\0') {
> + int pos, len;
> +
> + len = buf + i - entry;
> +
> + /* look for a directory wild card (ie "dir1/") */
> + if (buf[i - 1] == '/') {
> + if (ignore_case)
> + adjust_dirname_case(istate, entry);
> +
> + pos = index_name_pos(istate, entry, len);
> + if (pos < 0) {
> + pos = -pos - 1;
> + while (pos < istate->cache_nr && !fspathncmp(istate->cache[pos]->name, entry, len)) {
> + istate->cache[pos]->ce_flags &= ~CE_SKIP_WORKTREE;
> + pos++;
> + }
> + }
> + } else {
> + if (ignore_case) {
> + struct cache_entry *ce = index_file_exists(istate, entry, len, ignore_case);
> + if (ce)
> + ce->ce_flags &= ~CE_SKIP_WORKTREE;
> + } else {
> + int pos = index_name_pos(istate, entry, len);
> + if (pos >= 0)
> + istate->cache[pos]->ce_flags &= ~CE_SKIP_WORKTREE;
> + }
> + }
> +
> + entry += len + 1;
> + }
> + }
> +}
> +
> +/*
> + * Free the virtual working directory data structures.
> + */
> +void free_virtualworkdir(void) {
> + hashmap_free(&virtual_workdir_hashmap, 1);
> + hashmap_free(&parent_directory_hashmap, 1);
> + strbuf_release(&virtual_workdir_data);
> +}
> diff --git a/virtualworkdir.h b/virtualworkdir.h
> new file mode 100644
> index 0000000000..139d019d44
> --- /dev/null
> +++ b/virtualworkdir.h
> @@ -0,0 +1,25 @@
> +#ifndef VIRTUALWORKDIR_H
> +#define VIRTUALWORKDIR_H
> +
> +/*
> + * Update the CE_SKIP_WORKTREE bits based on the virtual working directory.
> + */
> +void apply_virtualworkdir(struct index_state *istate);
> +
> +/*
> + * Return 1 if the requested item is found in the virtual working directory,
> + * 0 for not found and -1 for undecided.
> + */
> +int is_included_in_virtualworkdir(const char *pathname, int pathlen);
> +
> +/*
> + * Return 1 for exclude, 0 for include and -1 for undecided.
> + */
> +int is_excluded_from_virtualworkdir(const char *pathname, int pathlen, int dtype);
> +
> +/*
> + * Free the virtual working directory data structures.
> + */
> +void free_virtualworkdir(void);
> +
> +#endif
>
> base-commit: 5d826e972970a784bd7a7bdf587512510097b8c7
>
prev parent reply other threads:[~2019-01-28 19:00 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-30 19:16 [RFC v1] Add virtual file system settings and hook proc Ben Peart
2018-10-30 23:07 ` Junio C Hamano
2018-10-31 20:12 ` Ben Peart
2018-11-05 0:02 ` Junio C Hamano
2018-11-05 20:00 ` Ben Peart
2018-10-31 19:11 ` Duy Nguyen
2018-10-31 20:53 ` Ben Peart
2018-11-04 6:34 ` Duy Nguyen
2018-11-04 21:01 ` brian m. carlson
2018-11-05 15:22 ` Duy Nguyen
2018-11-05 20:18 ` Ben Peart
2018-11-05 20:27 ` Ben Peart
2018-11-05 11:40 ` Ævar Arnfjörð Bjarmason
2018-11-05 15:26 ` Duy Nguyen
2018-11-05 20:07 ` Ben Peart
2018-11-05 21:53 ` Johannes Schindelin
2018-11-27 19:50 ` [PATCH v1] teach git to support a virtual (partially populated) work directory Ben Peart
2018-11-28 13:31 ` SZEDER Gábor
2018-11-29 14:09 ` Ben Peart
2018-12-13 19:41 ` [PATCH v2] " Ben Peart
2019-01-28 19:00 ` Ben Peart [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=09c16383-9778-29a6-80f4-3cfdbc5d180c@gmail.com \
--to=peartben@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=avarab@gmail.com \
--cc=benpeart@microsoft.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
--cc=sandals@crustytoothpaste.net \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).