git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Peart <peartben@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>, git@vger.kernel.org
Subject: Re: [PATCH] unpack-trees: allow missing CE_SKIP_WORKTREE objs
Date: Tue, 9 Oct 2018 10:48:34 -0400	[thread overview]
Message-ID: <7ac4ea1d-596d-cffd-337f-bf1879d982e1@gmail.com> (raw)
In-Reply-To: <20181008214816.42856-1-jonathantanmy@google.com>



On 10/8/2018 5:48 PM, Jonathan Tan wrote:
> Whenever a sparse checkout occurs, the existence of all blobs in the
> index is verified, whether or not they are included or excluded by the
> .git/info/sparse-checkout specification. This degrades performance,
> significantly in the case of a partial clone, because a lazy fetch
> occurs whenever the existence of a missing blob is checked.
> 
> At the point of invoking cache_tree_update() in unpack_trees(),
> CE_SKIP_WORKTREE is already set on all excluded blobs
> (mark_new_skip_worktree() is called twice to set CE_NEW_SKIP_WORKTREE,
> then apply_sparse_checkout() is called which copies over
> CE_NEW_SKIP_WORKTREE to CE_SKIP_WORKTREE). cache_tree_update() can use
> this information to know which blobs are excluded, and thus skip the
> checking of these.
> 
> Because cache_tree_update() is used from multiple places, this new
> behavior is guarded by a new flag WRITE_TREE_SKIP_WORKTREE_MISSING_OK.
> Implement this new flag, and teach unpack_trees() to invoke
> cache_tree_update() with this new flag.
> 

I wonder if preventing the download of all missing blobs should be 
limited to only the checkout command.  When you looked at the other 
places that call cache_tree_update(), does it make sense that they 
trigger the download of all the missing blobs?  For example, I suspect 
you would not want them all downloaded just to do a merge-recursive.

In full disclosure, we implemented this a couple of years ago [1] for 
GVFS and opted to do the logic a little differently.  We found that we 
didn't want to trigger the download of all missing blobs in 
cache_tree_update() for _any_ command that was executing in a partially 
cloned repo.  This is safe because if any individual blob is actually 
needed, it will get downloaded "on demand" already.

[1] https://github.com/Microsoft/git/commit/ec865500d98


> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>   cache-tree.c                     |  6 +++++-
>   cache-tree.h                     |  4 ++++
>   t/t1090-sparse-checkout-scope.sh | 33 ++++++++++++++++++++++++++++++++
>   unpack-trees.c                   |  1 +
>   4 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/cache-tree.c b/cache-tree.c
> index 5ce51468f0..340caf2d34 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -246,6 +246,8 @@ static int update_one(struct cache_tree *it,
>   	int missing_ok = flags & WRITE_TREE_MISSING_OK;
>   	int dryrun = flags & WRITE_TREE_DRY_RUN;
>   	int repair = flags & WRITE_TREE_REPAIR;
> +	int skip_worktree_missing_ok =
> +		flags & WRITE_TREE_SKIP_WORKTREE_MISSING_OK;
>   	int to_invalidate = 0;
>   	int i;
>   
> @@ -356,7 +358,9 @@ static int update_one(struct cache_tree *it,
>   		}
>   
>   		if (is_null_oid(oid) ||
> -		    (mode != S_IFGITLINK && !missing_ok && !has_object_file(oid))) {
> +		    (mode != S_IFGITLINK && !missing_ok &&
> +		     !(skip_worktree_missing_ok && ce_skip_worktree(ce)) &&
> +		     !has_object_file(oid))) {
>   			strbuf_release(&buffer);
>   			if (expected_missing)
>   				return -1;
> diff --git a/cache-tree.h b/cache-tree.h
> index 0ab6784ffe..655d487619 100644
> --- a/cache-tree.h
> +++ b/cache-tree.h
> @@ -40,6 +40,10 @@ void cache_tree_verify(struct index_state *);
>   #define WRITE_TREE_DRY_RUN 4
>   #define WRITE_TREE_SILENT 8
>   #define WRITE_TREE_REPAIR 16
> +/*
> + * Do not check for the presence of cache entry objects with CE_SKIP_WORKTREE.
> + */
> +#define WRITE_TREE_SKIP_WORKTREE_MISSING_OK 32
>   
>   /* error return codes */
>   #define WRITE_TREE_UNREADABLE_INDEX (-1)
> diff --git a/t/t1090-sparse-checkout-scope.sh b/t/t1090-sparse-checkout-scope.sh
> index 25d7c700f6..090b7fc3d3 100755
> --- a/t/t1090-sparse-checkout-scope.sh
> +++ b/t/t1090-sparse-checkout-scope.sh
> @@ -63,4 +63,37 @@ test_expect_success 'return to full checkout of master' '
>   	test "$(cat b)" = "modified"
>   '
>   
> +test_expect_success 'in partial clone, sparse checkout only fetches needed blobs' '
> +	test_create_repo server &&
> +	git clone "file://$(pwd)/server" client &&
> +
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +	echo a >server/a &&
> +	echo bb >server/b &&
> +	mkdir server/c &&
> +	echo ccc >server/c/c &&
> +	git -C server add a b c/c &&
> +	git -C server commit -m message &&
> +
> +	test_config -C client core.sparsecheckout 1 &&
> +	test_config -C client extensions.partialclone origin &&
> +	echo "!/*" >client/.git/info/sparse-checkout &&
> +	echo "/a" >>client/.git/info/sparse-checkout &&
> +	git -C client fetch --filter=blob:none origin &&
> +	git -C client checkout FETCH_HEAD &&
> +
> +	git -C client rev-list HEAD \
> +		--quiet --objects --missing=print >unsorted_actual &&
> +	(
> +		printf "?" &&
> +		git hash-object server/b &&
> +		printf "?" &&
> +		git hash-object server/c/c
> +	) >unsorted_expect &&
> +	sort unsorted_actual >actual &&
> +	sort unsorted_expect >expect &&
> +	test_cmp expect actual
> +'
> +
>   test_done
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 51bfac6aa0..39e0e7a6c7 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1635,6 +1635,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>   				o->result.cache_tree = cache_tree();
>   			if (!cache_tree_fully_valid(o->result.cache_tree))
>   				cache_tree_update(&o->result,
> +						  WRITE_TREE_SKIP_WORKTREE_MISSING_OK |
>   						  WRITE_TREE_SILENT |
>   						  WRITE_TREE_REPAIR);
>   		}
> 

  parent reply	other threads:[~2018-10-09 14:48 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-08 21:48 [PATCH] unpack-trees: allow missing CE_SKIP_WORKTREE objs Jonathan Tan
2018-10-09  9:27 ` Junio C Hamano
2018-10-09  9:30 ` Junio C Hamano
2018-10-09 14:49   ` Ben Peart
2018-10-09 14:48 ` Ben Peart [this message]
2018-10-09 18:40 ` [PATCH v2] cache-tree: skip some blob checks in partial clone Jonathan Tan
2018-10-10  1:19   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ac4ea1d-596d-cffd-337f-bf1879d982e1@gmail.com \
    --to=peartben@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).