All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [RFC] git checkout $tree -- $path always rewrites files
Date: Fri, 7 Nov 2014 03:13:24 -0500	[thread overview]
Message-ID: <20141107081324.GA19845@peff.net> (raw)

I noticed that "git checkout $tree -- $path" will _always_ unlink and
write a new copy of each matching path, even if they are up-to-date with
the index and the content in $tree is the same.

Here's a simple reproduction:

    # some repo with a file in it
    git init
    echo content >foo && git add foo && git commit -m foo
    
    # give the file a known timestamp (it doesn't matter what)
    perl -e 'utime(123, 123, @ARGV)' foo
    git update-index --refresh
    
    # now check out an identical copy
    git checkout HEAD -- foo
    
    # and check whether it was updated
    perl -le 'print ((stat)[9]) for @ARGV' foo

which yields a modern timestamp, showing that the file was rewritten.

If you use

    git checkout -- foo

instead to checkout from the index, that works fine (the final line
prints 123). Similarly, if you load the new index and checkout
separately, like:

    git read-tree -m $tree
    git checkout -- foo

that also works. Of course it is not quite the same; read-tree loads the
whole index from $tree, whereas checkout would only touch a subset of
the entries. And that's the culprit; checkout does not use unpack-trees
to only touch the entries that need updating. It uses read_tree_recursive
with a pathspec, and just replaces any index entries that match.

Below is a patch which makes it do what I expect by silently copying
flags and stat-data from the old entry, but I feel like it is going in
the wrong direction. Besides causing a few tests to fail (which I
suspect is because I implemented it at the wrong layer), it really feels
like this should be using unpack-trees or something similar.

After all, that is what "git reset $tree -- foo" does, and it gets this
case right (in fact, you can replace the read-tree above with it). It
looks like it actually does a pathspec-limited index-diff rather than
unpack-trees, but the end effect is the same (and while checkout could
just pass "update" to unpack-trees, we need to handle checking out
directly from the index anyway, so I do not think it is a big deal to
keep the index update as a separate step).

Is there a reason that we don't use this diff technique for checkout?

Anyway, here is the (wrong) patch I came up with initially.

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 29b0f6d..802e556 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -273,19 +273,13 @@ static int checkout_paths(const struct checkout_opts *opts,
 		ce->ce_flags &= ~CE_MATCHED;
 		if (!opts->ignore_skipworktree && ce_skip_worktree(ce))
 			continue;
-		if (opts->source_tree && !(ce->ce_flags & CE_UPDATE))
-			/*
-			 * "git checkout tree-ish -- path", but this entry
-			 * is in the original index; it will not be checked
-			 * out to the working tree and it does not matter
-			 * if pathspec matched this entry.  We will not do
-			 * anything to this entry at all.
-			 */
-			continue;
+
 		/*
-		 * Either this entry came from the tree-ish we are
-		 * checking the paths out of, or we are checking out
-		 * of the index.
+		 * If we are checking out from a tree-ish, we already
+		 * did our pathspec matching.  However, since we then
+		 * drop the CE_UPDATE flag from any paths that do not
+		 * need updating, we don't know which ones were not
+		 * mentioned and which ones were simply up-to-date.
 		 *
 		 * If it comes from the tree-ish, we already know it
 		 * matches the pathspec and could just stamp
diff --git a/read-cache.c b/read-cache.c
index 8f3e9eb..fb2240b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -962,8 +962,13 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 
 	/* existing match? Just replace it. */
 	if (pos >= 0) {
-		if (!new_only)
+		if (!new_only) {
+			struct cache_entry *old = istate->cache[pos];
+			if (!is_null_sha1(ce->sha1) &&
+			    !hashcmp(old->sha1, ce->sha1))
+				copy_cache_entry(ce, old);
 			replace_index_entry(istate, pos, ce);
+		}
 		return 0;
 	}
 	pos = -pos-1;

             reply	other threads:[~2014-11-07  8:13 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-07  8:13 Jeff King [this message]
2014-11-07  8:38 ` [RFC] git checkout $tree -- $path always rewrites files Jeff King
2014-11-07 10:13   ` Duy Nguyen
2014-11-07 16:51     ` Junio C Hamano
2014-11-07 19:15     ` Jeff King
2014-11-07 17:14 ` Junio C Hamano
2014-11-07 19:17   ` Jeff King
     [not found]     ` <CANiSa6hufp=80TaesNpo1CxCbwVq3LPXvYaUSbcmzPE5pj_GGw@mail.gmail.com>
2014-11-08  7:10       ` Martin von Zweigbergk
     [not found]         ` <CAPc5daWdzrHr8Rdksr3HycMRQu0=Ji7h=BPYjzZj7MH6Ko0VgQ@mail.gmail.com>
2014-11-08  8:03           ` Martin von Zweigbergk
2014-11-08  8:30           ` Jeff King
2014-11-08  8:45             ` Jeff King
2014-11-09 18:37               ` Junio C Hamano
2014-11-08 16:19             ` Martin von Zweigbergk
2014-11-09  9:42               ` Jeff King
2014-11-09 17:21             ` Junio C Hamano
2014-11-13 18:30               ` Jeff King
2014-11-13 19:15                 ` Junio C Hamano
2014-11-13 19:26                   ` Jeff King
2014-11-13 20:03                     ` Jeff King
2014-11-13 21:18                       ` Junio C Hamano
2014-11-13 21:37                         ` Jeff King
2014-11-14  5:44               ` David Aguilar
2014-11-14 19:27                 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141107081324.GA19845@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.