From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "Michael Haggerty" <mhagger@alum.mit.edu>,
"Stefan Beller" <sbeller@google.com>,
"Git Mailing List" <git@vger.kernel.org>,
"Jay Soffian" <jaysoffian@gmail.com>,
"Björn Gustavsson" <bgustavsson@gmail.com>
Subject: Re: Why is "git fetch --prune" so much slower than "git remote prune"?
Date: Fri, 6 Mar 2015 17:59:17 -0500 [thread overview]
Message-ID: <20150306225917.GA1589@peff.net> (raw)
In-Reply-To: <CACBZZX5n5tTCSa-_A5gQzbzboF_v8a3_oVUjdjyFtKHHe8h-NA@mail.gmail.com>
On Fri, Mar 06, 2015 at 05:48:39PM +0100, Ævar Arnfjörð Bjarmason wrote:
> The --prune option to fetch added in v1.6.5-8-gf360d84 seems to be
> around 20-30x slower than the equivalent operation with git remote
> prune. I'm wondering if I'm missing something and fetch does something
> more, but it doesn't seem so.
"git fetch --prune" is "do a normal fetch, and also prune anything
necessary". "git remote prune" is "ls-remote the other side and see if
there is anything we can prune; do not touch anything else".
If your fetch is a noop (i.e., the other side has not advanced any
branches), the outcome is the same. But perhaps fetch is doing more
work to find out that it is a noop.
One way to measure that would be to see how expensive a noop "git fetch"
is (if it's expensive, then there is room to improve there. If not, then
it is the pruning itself that is expensive).
But just guessing (I do not have time to dig in deeper right now), and
seeing this:
> $ gprof ~/g/git/git-fetch|head -n 20
> Flat profile:
>
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls s/call s/call name
> 26.42 0.33 0.33 1584583 0.00 0.00 strbuf_getwholeline
> 14.63 0.51 0.18 90601347 0.00 0.00 strbuf_grow
> 13.82 0.68 0.17 1045676 0.00 0.00 find_pack_entry_one
> 8.13 0.78 0.10 1050062 0.00 0.00 check_refname_format
> 6.50 0.86 0.08 1584675 0.00 0.00 get_sha1_hex
> 5.69 0.93 0.07 2100529 0.00 0.00 starts_with
> 3.25 0.97 0.04 1044043 0.00 0.00 refname_is_safe
> 3.25 1.01 0.04 8007 0.00 0.00 get_packed_ref_cache
> 2.44 1.04 0.03 2605595 0.00 0.00 search_ref_dir
> 2.44 1.07 0.03 1040500 0.00 0.00 peel_entry
> 1.63 1.09 0.02 2632661 0.00 0.00 get_ref_dir
> 1.63 1.11 0.02 1044043 0.00 0.00 create_ref_entry
> 1.63 1.13 0.02 8024 0.00 0.00 do_for_each_entry_in_dir
> 0.81 1.14 0.01 2155105 0.00 0.00 memory_limit_check
> 0.81 1.15 0.01 1580503 0.00 0.00 sha1_to_hex
We spend a lot of time checking refs here. Probably this comes from
writing the `packed-refs` file out 1000 times in your example, because
fetch handles each ref individually. Whereas since c9e768b (remote:
repack packed-refs once when deleting multiple refs, 2014-05-23),
git-remote does it in one pass.
Now that we have ref_transaction_*, I think if git-fetch fed all of the
deletes (along with the updates) into a single transaction, we would get
the same optimization for free. Maybe that is even part of some of the
pending ref_transaction work from Stefan or Michael (both cc'd). I
haven't kept up very well with what is cooking in pu.
-Peff
next prev parent reply other threads:[~2015-03-06 22:59 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-06 16:48 Why is "git fetch --prune" so much slower than "git remote prune"? Ævar Arnfjörð Bjarmason
2015-03-06 22:59 ` Jeff King [this message]
2015-03-19 14:49 ` Michael Haggerty
2015-03-19 17:14 ` Jeff King
2015-03-19 19:24 ` Junio C Hamano
2015-03-19 21:26 ` Jeff King
2015-03-20 4:51 ` Michael Haggerty
2015-03-20 7:04 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150306225917.GA1589@peff.net \
--to=peff@peff.net \
--cc=avarab@gmail.com \
--cc=bgustavsson@gmail.com \
--cc=git@vger.kernel.org \
--cc=jaysoffian@gmail.com \
--cc=mhagger@alum.mit.edu \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).