git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Martin von Zweigbergk <martinvonz@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [RFC] git checkout $tree -- $path always rewrites files
Date: Thu, 13 Nov 2014 13:30:34 -0500	[thread overview]
Message-ID: <20141113183033.GA24107@peff.net> (raw)
In-Reply-To: <xmqqbnoge1ci.fsf@gitster.dls.corp.google.com>

On Sun, Nov 09, 2014 at 09:21:49AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote:
> >
> >> I think that has direct linkage; what you have in mind I think is
> >> http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935
> >
> > Thanks for that link.
> 
> It was one of the items in the "git blame leftover bits" list
> (websearch for that exact phrase), so I didn't have to do any
> digging just for this thread ;-)
> 
> But I made a huge typo above.  s/I think/I do not think/;

Oh. That might explain some of my confusion. :)

> The original observation you made in this thread is that when "git
> checkout $tree - $pathspec", whose defintion is to "grab paths in
> the $tree that match $pathspec and put them in the index, and then
> overwrite the working tree paths with the contents of these paths
> that are updated in the index with the contents taken from the
> $tree", unnecessarily rewrites the working tree paths even when they
> already had contents that were up to date.  That is what I called an
> "implementation glitch".
> 
> The old thread is a different topic.
> [...]

Right, I do agree that these things don't need to be linked. The reason
I ended up dealing with the deletion thing is that one obvious way to
implement "do not touch entries which have not changed" is by running a
diff between $tree and the index. But doing a diff means we have to
reconsider all of the code surrounding deletions (and handling of
unmerged entries in the pathspec but not in $tree). I tried a few
variants and had trouble making it work (getting caught up either in the
"make sure each pathspec matched" code, or the "treat unmerged entries
specially" behavior).

In the patch below, I ended up retaining the existing
read_tree_recursive code, and just specially handling the replacement of
index entries (which is itself sort of a diff, but it fits nicely into
the existing scheme).

> I'd prefer that these two to be treated separately.

Yeah, that makes sense after reading your emails. What I was really
unclear on was whether the handling of deletion was a bug or a design
choice, and it is the latter (if it were the former, we would not need a
transition plan :) ).

Anyway, here is the patch.

-- >8 --
Subject: checkout $tree: do not throw away unchanged index entries

When we "git checkout $tree", we pull paths from $tree into
the index, and then check the resulting entries out to the
worktree. Our method for the first step is rather
heavy-handed, though; it clobbers the entire existing index
entry, even if the content is the same. This means we lose
our stat information, leading checkout_entry to later
rewrite the entire file with identical content.

Instead, let's see if we have the identical entry already in
the index, in which case we leave it in place. That lets
checkout_entry do the right thing. Our tests cover two
interesting cases:

  1. We make sure that a file which has no changes is not
     rewritten.

  2. We make sure that we do update a file that is unchanged
     in the index (versus $tree), but has working tree
     changes. We keep the old index entry, and
     checkout_entry is able to realize that our stat
     information is out of date.

Signed-off-by: Jeff King <peff@peff.net>
---
Note that the test refreshes the index manually (because we are tweaking
the timestamp of file2). In normal use this should not be necessary
(i.e., your entries should generally be uptodate). I did wonder if
checkout should be refreshing the index itself, but it would a bunch of
extra lstats in the common case.

 builtin/checkout.c        | 31 +++++++++++++++++++++++++------
 t/t2022-checkout-paths.sh | 17 +++++++++++++++++
 2 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 5410dac..67cab4e 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -65,21 +65,40 @@ static int post_checkout_hook(struct commit *old, struct commit *new,
 static int update_some(const unsigned char *sha1, const char *base, int baselen,
 		const char *pathname, unsigned mode, int stage, void *context)
 {
-	int len;
+	struct strbuf buf = STRBUF_INIT;
 	struct cache_entry *ce;
+	int pos;
 
 	if (S_ISDIR(mode))
 		return READ_TREE_RECURSIVE;
 
-	len = baselen + strlen(pathname);
-	ce = xcalloc(1, cache_entry_size(len));
+	strbuf_add(&buf, base, baselen);
+	strbuf_addstr(&buf, pathname);
+
+	/*
+	 * If the entry is the same as the current index, we can leave the old
+	 * entry in place. Whether it is UPTODATE or not, checkout_entry will
+	 * do the right thing.
+	 */
+	pos = cache_name_pos(buf.buf, buf.len);
+	if (pos >= 0) {
+		struct cache_entry *old = active_cache[pos];
+		if (create_ce_mode(mode) == old->ce_mode &&
+		    !hashcmp(old->sha1, sha1)) {
+			old->ce_flags |= CE_UPDATE;
+			strbuf_release(&buf);
+			return 0;
+		}
+	}
+
+	ce = xcalloc(1, cache_entry_size(buf.len));
 	hashcpy(ce->sha1, sha1);
-	memcpy(ce->name, base, baselen);
-	memcpy(ce->name + baselen, pathname, len - baselen);
+	memcpy(ce->name, buf.buf, buf.len);
 	ce->ce_flags = create_ce_flags(0) | CE_UPDATE;
-	ce->ce_namelen = len;
+	ce->ce_namelen = buf.len;
 	ce->ce_mode = create_ce_mode(mode);
 	add_cache_entry(ce, ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
+	strbuf_release(&buf);
 	return 0;
 }
 
diff --git a/t/t2022-checkout-paths.sh b/t/t2022-checkout-paths.sh
index 8e3545d..f46d049 100755
--- a/t/t2022-checkout-paths.sh
+++ b/t/t2022-checkout-paths.sh
@@ -61,4 +61,21 @@ test_expect_success 'do not touch unmerged entries matching $path but not in $tr
 	test_cmp expect.next0 actual.next0
 '
 
+test_expect_success 'do not touch files that are already up-to-date' '
+	git reset --hard &&
+	echo one >file1 &&
+	echo two >file2 &&
+	git add file1 file2 &&
+	git commit -m base &&
+	echo modified >file1 &&
+	test-chmtime =1000000000 file2 &&
+	git update-index -q --refresh &&
+	git checkout HEAD -- file1 file2 &&
+	echo one >expect &&
+	test_cmp expect file1 &&
+	echo "1000000000	file2" >expect &&
+	test-chmtime -v +0 file2 >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.1.2.596.g7379948

  reply	other threads:[~2014-11-13 18:30 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-07  8:13 [RFC] git checkout $tree -- $path always rewrites files Jeff King
2014-11-07  8:38 ` Jeff King
2014-11-07 10:13   ` Duy Nguyen
2014-11-07 16:51     ` Junio C Hamano
2014-11-07 19:15     ` Jeff King
2014-11-07 17:14 ` Junio C Hamano
2014-11-07 19:17   ` Jeff King
     [not found]     ` <CANiSa6hufp=80TaesNpo1CxCbwVq3LPXvYaUSbcmzPE5pj_GGw@mail.gmail.com>
2014-11-08  7:10       ` Martin von Zweigbergk
     [not found]         ` <CAPc5daWdzrHr8Rdksr3HycMRQu0=Ji7h=BPYjzZj7MH6Ko0VgQ@mail.gmail.com>
2014-11-08  8:03           ` Martin von Zweigbergk
2014-11-08  8:30           ` Jeff King
2014-11-08  8:45             ` Jeff King
2014-11-09 18:37               ` Junio C Hamano
2014-11-08 16:19             ` Martin von Zweigbergk
2014-11-09  9:42               ` Jeff King
2014-11-09 17:21             ` Junio C Hamano
2014-11-13 18:30               ` Jeff King [this message]
2014-11-13 19:15                 ` Junio C Hamano
2014-11-13 19:26                   ` Jeff King
2014-11-13 20:03                     ` Jeff King
2014-11-13 21:18                       ` Junio C Hamano
2014-11-13 21:37                         ` Jeff King
2014-11-14  5:44               ` David Aguilar
2014-11-14 19:27                 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141113183033.GA24107@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=martinvonz@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).