From: Jeff King <peff@peff.net>
To: larsxschneider@gmail.com
Cc: git@vger.kernel.org
Subject: Re: [PATCH v1] convert: add test to demonstrate clean invocations
Date: Thu, 21 Jul 2016 17:37:40 -0400 [thread overview]
Message-ID: <20160721213740.GB4604@sigill.intra.peff.net> (raw)
In-Reply-To: <1469134747-26785-1-git-send-email-larsxschneider@gmail.com>
On Thu, Jul 21, 2016 at 10:59:07PM +0200, larsxschneider@gmail.com wrote:
> From: Lars Schneider <larsxschneider@gmail.com>
>
> If three files processed by a filter are added and committed to a
> repository then I expect three clean invocations per Git operation.
> Apparently Git invokes the clean process 12 times.
>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---
>
> It's pretty late here and I hope do not miss something obvious... but
> can anyone explain to me why the clean operation is executed 12 times
> for 3 files?
Hmm. If I run this under gdb and backtrace when we hit apply_filter(), I
see:
> + rm -f run.count &&
> + git add . &&
> + test_line_count = 3 run.count
This "git add" does invoke apply_filter() 3 times, which I'd expect.
> + rm -f run.count &&
> + git commit . -m "test commit" &&
> + test_line_count = 3 run.count
This invokes apply_filter() six times. So something funny is going on
already (I do get 12 dots, so checking apply_filter() seems to only
cover half the invocations).
Three of them are for re-adding the three files to the index again,
since "git commit ." is asking us to do so. I'm surprised, though; I
would have thought we could avoid doing so just based on the stat
information. Maybe it's a racy-index thing because the files' mtimes are
likely to be the same as the index?
Indeed, if I stick a "sleep 1" between modifying the files and calling
"git add", then the "git commit" invocation drops to only 6 invocations
of the filter. So that explains half of it (though I'm still confused
why index refreshing requires 6 and not 3; I guess it may be because
"git commit ." works in a separate index, and we first refresh that
index, and then refresh the "real" index again afterwards, when we could
theoretically just copy the entries).
The next three are to show the final diffstat after the commit
completes. It looks like the "should we reuse the worktree file?"
optimization in diff_populate_filespec() does not take into account
whether we will have to convert the contents, and it probably should.
The point is that sometimes mmap-ing the file contents is more efficient
than zlib inflating the object from disk. But if we have to exec an
extra process and read the whole object contents into a strbuf, that is
probably not a win.
Adding a "return 0" at the top of reuse_worktree_file() drops our 6 to
3 (but obviously it should actually check the attributes).
So of the 12 invocations:
- 3 must be for refreshing the index again, because the way the test
is written causes us to err on the side of caution when the mtimes
are the same (and also means that even if your test is fixed to
pass, it would racily fail when the system is under load)
- 3 are for the double-refresh when "git commit ---only" is used (and
"git commit ." implies "--only". Seems like there is room for
optimization there.
- 3 are for the tree-diff reusing the worktree files. This should be
dropped.
- The other 3, I'm not sure.
-Peff
next prev parent reply other threads:[~2016-07-21 21:37 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-21 20:59 [PATCH v1] convert: add test to demonstrate clean invocations larsxschneider
2016-07-21 21:37 ` Jeff King [this message]
2016-07-22 15:27 ` [PATCH] diff: do not reuse worktree files that need "clean" conversion Jeff King
2016-07-22 17:44 ` Junio C Hamano
2016-07-22 18:10 ` Jeff King
2016-07-22 19:30 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160721213740.GB4604@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=larsxschneider@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).