All of lore.kernel.org
 help / color / mirror / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Rafael Silva <rafaeloliveira.cs@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: rather slow 'git repack' in 'blob:none' partial clones
Date: Tue, 13 Apr 2021 20:05:52 +0200	[thread overview]
Message-ID: <20210413180552.GI2947267@szeder.dev> (raw)
In-Reply-To: <YHTcHY+P7RuZJGab@coredump.intra.peff.net>

On Mon, Apr 12, 2021 at 07:47:41PM -0400, Jeff King wrote:
> On Mon, Apr 12, 2021 at 11:36:53PM +0200, SZEDER Gábor wrote:
> 
> > All what you wrote above makes sense to me, but I've never looked at
> > how partial clones work, so it doesn't mean anything...  In any case
> > your patch brings great speedups, but, unfortunately, the memory usage
> > remains as high as it was:
> > 
> >   $ /usr/bin/time --format=elapsed: %E  max RSS: %Mk /home/szeder/src/git/bin-wrappers/git -C git-partial.git/ gc
> >   Enumerating objects: 188450, done.
> >   Counting objects: 100% (188450/188450), done.
> >   Delta compression using up to 4 threads
> >   Compressing objects: 100% (66623/66623), done.
> >   Writing objects: 100% (188450/188450), done.
> >   Total 188450 (delta 120109), reused 188450 (delta 120109), pack-reused 0
> >   elapsed: 0:15.18  max RSS: 1888332k
> > 
> > And git.git is not all that large, I wonder how much memory would be
> > necessary to 'gc' a 'blob:none' clone of e.g. chromium?! :)
> > 
> > BTW, this high memory usage in a partial clone is not specific to
> > 'repack', 'fsck' suffers just as much:
> 
> I think the issue is in the exclude-promisor-object code paths of the
> traversal. Try this:
> 
>   [two clones of git.git, one full and one partial]
>   $ git clone --no-local --bare /path/to/git full.git
>   $ git clone --no-local --bare --filter=blob:none /path/to/git partial.git
> 
>   [full clone is quick to traverse all objects]
>   $ time git -C full.git rev-list --count --all
>   63215
>   real	0m0.369s
>   user	0m0.365s
>   sys	0m0.004s
> 
>   [partial is, too; it's the same amount of work because we're just
>    looking at the commits here]
>   $ time git -C partial.git rev-list --count --all
>   63215
>   real	0m0.373s
>   user	0m0.364s
>   sys	0m0.009s
> 
>   [but now ask it to exclude promisor objects, and it's much slower;
>   this is because is_promisor_object() opens up each tree in the pack in
>   order to see which "promised" objects it mentions]

I don't understand this: 'git rev-list --count --all' only counts
commit objects, so why should it open any trees at all?

>   $ time git -C partial.git rev-list --exclude-promisor-objects --count --all
>   0
>   real	0m11.723s
>   user	0m11.354s
>   sys	0m0.369s
> 
> And I think that is the source for the memory use, too. Without that
> option, we peak at 11MB heap. With it, 1.6GB. Oops. Looks like there
> might be some "leaks" when we parse tree objects and then leave the
> buffers connected to the structs (technically not a leak because we
> still have the pointers, but obviously having every tree in memory at
> once is not good).
> 
> The patch below drops the peak heap to 165MB. Still quite a bit more,
> but I think it's a combination of delta-base cache (96MB) plus extra
> structs for all the non-commit objects whose flags we marked.
> 
> It does seem like there's probably a good space/time tradeoff to be made
> here (e.g., caching the list of "promisor" object ids for the pack
> instead of inflating and reading all of the trees on the fly).
> 
> diff --git a/packfile.c b/packfile.c
> index 8668345d93..b79cbc8cd4 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -2247,6 +2247,7 @@ static int add_promisor_object(const struct object_id *oid,
>  			return 0;
>  		while (tree_entry_gently(&desc, &entry))
>  			oidset_insert(set, &entry.oid);
> +		free_tree_buffer(tree);
>  	} else if (obj->type == OBJ_COMMIT) {
>  		struct commit *commit = (struct commit *) obj;
>  		struct commit_list *parents = commit->parents;
> diff --git a/revision.c b/revision.c
> index 553c0faa9b..fac2577748 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -3271,8 +3271,15 @@ static int mark_uninteresting(const struct object_id *oid,
>  			      void *cb)
>  {
>  	struct rev_info *revs = cb;
> +	/*
> +	 * yikes, do we really need to parse here? maybe

Heh, a "yikes" here and a "yuck" in your previous patch...  This issue
was worth reporting :)

> +	 * lookup_unknown_object() would be sufficient, or
> +	 * even oid_object_info() followed by the correct type
> +	 */
>  	struct object *o = parse_object(revs->repo, oid);
>  	o->flags |= UNINTERESTING | SEEN;
> +	if (o->type == OBJ_TREE)
> +		free_tree_buffer((struct tree *)o);
>  	return 0;
>  }
>  
> 
> -Peff

  parent reply	other threads:[~2021-04-13 18:05 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-03  9:04 rather slow 'git repack' in 'blob:none' partial clones SZEDER Gábor
2021-04-05  1:02 ` Rafael Silva
2021-04-07 21:17   ` Jeff King
2021-04-08  0:02     ` Jonathan Tan
2021-04-08  0:35       ` Jeff King
2021-04-12  7:09     ` Rafael Silva
2021-04-12 21:36     ` SZEDER Gábor
2021-04-12 21:49       ` Bryan Turner
2021-04-12 23:51         ` Jeff King
2021-04-12 23:47       ` Jeff King
2021-04-13  7:12         ` [PATCH 0/3] low-hanging performance fruit with promisor packs Jeff King
2021-04-13  7:15           ` [PATCH 1/3] is_promisor_object(): free tree buffer after parsing Jeff King
2021-04-13 20:17             ` Junio C Hamano
2021-04-14  5:18               ` Jeff King
2021-04-13  7:16           ` [PATCH 2/3] lookup_unknown_object(): take a repository argument Jeff King
2021-04-13  7:17           ` [PATCH 3/3] revision: avoid parsing with --exclude-promisor-objects Jeff King
2021-04-13 20:22             ` Junio C Hamano
2021-04-13 18:10           ` [PATCH 0/3] low-hanging performance fruit with promisor packs SZEDER Gábor
2021-04-14 17:14           ` Jonathan Tan
2021-04-14 19:22           ` Rafael Silva
2021-04-13 18:05         ` SZEDER Gábor [this message]
2021-04-14  5:14           ` rather slow 'git repack' in 'blob:none' partial clones Jeff King
2021-04-11 10:59   ` SZEDER Gábor
2021-04-12  7:53     ` Rafael Silva
2021-04-14 19:14 ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Rafael Silva
2021-04-14 19:14   ` [PATCH 1/2] repack: teach --no-prune-packed to skip `git prune-packed` Rafael Silva
2021-04-14 23:50     ` Jonathan Tan
2021-04-18 14:15       ` Rafael Silva
2021-04-14 19:14   ` [PATCH 2/2] repack: avoid loosening promisor pack objects in partial clones Rafael Silva
2021-04-15  1:04     ` Jonathan Tan
2021-04-15  3:51       ` Junio C Hamano
2021-04-15  9:03         ` Jeff King
2021-04-15  9:05       ` Jeff King
2021-04-18  7:12       ` Rafael Silva
2021-04-15 18:06     ` Junio C Hamano
2021-04-18  8:40       ` Rafael Silva
2021-04-14 22:10   ` [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Junio C Hamano
2021-04-15  9:15   ` Jeff King
2021-04-18  8:20     ` Rafael Silva
2021-04-18 13:57   ` [PATCH v2 0/1] " Rafael Silva
2021-04-18 13:57     ` [PATCH v2 1/1] repack: avoid loosening promisor objects in partial clones Rafael Silva
2021-04-19 19:15       ` Jonathan Tan
2021-04-21 18:54         ` Rafael Silva
2021-04-19 23:09       ` Junio C Hamano
2021-04-21 19:25         ` Rafael Silva
2021-04-21 19:32     ` [PATCH v3] " Rafael Silva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210413180552.GI2947267@szeder.dev \
    --to=szeder.dev@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    --cc=rafaeloliveira.cs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.