[RFC] git checkout $tree -- $path always rewrites files

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] git checkout $tree -- $path always rewrites files
@ 2014-11-07  8:13 Jeff King
  2014-11-07  8:38 ` Jeff King
  2014-11-07 17:14 ` Junio C Hamano
  0 siblings, 2 replies; 23+ messages in thread
From: Jeff King @ 2014-11-07  8:13 UTC (permalink / raw)
  To: git

I noticed that "git checkout $tree -- $path" will _always_ unlink and
write a new copy of each matching path, even if they are up-to-date with
the index and the content in $tree is the same.

Here's a simple reproduction:

    # some repo with a file in it
    git init
    echo content >foo && git add foo && git commit -m foo

    # give the file a known timestamp (it doesn't matter what)
    perl -e 'utime(123, 123, @ARGV)' foo
    git update-index --refresh

    # now check out an identical copy
    git checkout HEAD -- foo

    # and check whether it was updated
    perl -le 'print ((stat)[9]) for @ARGV' foo

which yields a modern timestamp, showing that the file was rewritten.

If you use

    git checkout -- foo

instead to checkout from the index, that works fine (the final line
prints 123). Similarly, if you load the new index and checkout
separately, like:

    git read-tree -m $tree
    git checkout -- foo

that also works. Of course it is not quite the same; read-tree loads the
whole index from $tree, whereas checkout would only touch a subset of
the entries. And that's the culprit; checkout does not use unpack-trees
to only touch the entries that need updating. It uses read_tree_recursive
with a pathspec, and just replaces any index entries that match.

Below is a patch which makes it do what I expect by silently copying
flags and stat-data from the old entry, but I feel like it is going in
the wrong direction. Besides causing a few tests to fail (which I
suspect is because I implemented it at the wrong layer), it really feels
like this should be using unpack-trees or something similar.

After all, that is what "git reset $tree -- foo" does, and it gets this
case right (in fact, you can replace the read-tree above with it). It
looks like it actually does a pathspec-limited index-diff rather than
unpack-trees, but the end effect is the same (and while checkout could
just pass "update" to unpack-trees, we need to handle checking out
directly from the index anyway, so I do not think it is a big deal to
keep the index update as a separate step).

Is there a reason that we don't use this diff technique for checkout?

Anyway, here is the (wrong) patch I came up with initially.

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 29b0f6d..802e556 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -273,19 +273,13 @@ static int checkout_paths(const struct checkout_opts *opts,
 		ce->ce_flags &= ~CE_MATCHED;
 		if (!opts->ignore_skipworktree && ce_skip_worktree(ce))
 			continue;
-		if (opts->source_tree && !(ce->ce_flags & CE_UPDATE))
-			/*
-			 * "git checkout tree-ish -- path", but this entry
-			 * is in the original index; it will not be checked
-			 * out to the working tree and it does not matter
-			 * if pathspec matched this entry.  We will not do
-			 * anything to this entry at all.
-			 */
-			continue;
+
 		/*
-		 * Either this entry came from the tree-ish we are
-		 * checking the paths out of, or we are checking out
-		 * of the index.
+		 * If we are checking out from a tree-ish, we already
+		 * did our pathspec matching.  However, since we then
+		 * drop the CE_UPDATE flag from any paths that do not
+		 * need updating, we don't know which ones were not
+		 * mentioned and which ones were simply up-to-date.
 		 *
 		 * If it comes from the tree-ish, we already know it
 		 * matches the pathspec and could just stamp
diff --git a/read-cache.c b/read-cache.c
index 8f3e9eb..fb2240b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -962,8 +962,13 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e

 	/* existing match? Just replace it. */
 	if (pos >= 0) {
-		if (!new_only)
+		if (!new_only) {
+			struct cache_entry *old = istate->cache[pos];
+			if (!is_null_sha1(ce->sha1) &&
+			    !hashcmp(old->sha1, ce->sha1))
+				copy_cache_entry(ce, old);
 			replace_index_entry(istate, pos, ce);
+		}
 		return 0;
 	}
 	pos = -pos-1;

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-07  8:13 [RFC] git checkout $tree -- $path always rewrites files Jeff King
@ 2014-11-07  8:38 ` Jeff King
  2014-11-07 10:13   ` Duy Nguyen
  2014-11-07 17:14 ` Junio C Hamano
  1 sibling, 1 reply; 23+ messages in thread
From: Jeff King @ 2014-11-07  8:38 UTC (permalink / raw)
  To: git

On Fri, Nov 07, 2014 at 03:13:24AM -0500, Jeff King wrote:

> I noticed that "git checkout $tree -- $path" will _always_ unlink and
> write a new copy of each matching path, even if they are up-to-date with
> the index and the content in $tree is the same.

By the way, one other thing I wondered while looking at this code: when
we checkout a working tree file, we unlink the old one and write the new
one in-place. Is there a particular reason we do this versus writing to
a temporary file and renaming it into place?  That would give
simultaneous readers a more atomic view.

I suspect the answer is something like: you cannot always do a rename,
because you might have a typechange, directory becoming a file, or vice
versa; so anyone relying on an atomic view during a checkout operation
is already Doing It Wrong.  Handling a content-change of an existing
path would complicate the code, so we do not bother.

But I would be curious to hear confirmation of that.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-07  8:38 ` Jeff King
@ 2014-11-07 10:13   ` Duy Nguyen
  2014-11-07 16:51     ` Junio C Hamano
  2014-11-07 19:15     ` Jeff King
  0 siblings, 2 replies; 23+ messages in thread
From: Duy Nguyen @ 2014-11-07 10:13 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On Fri, Nov 7, 2014 at 3:38 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Nov 07, 2014 at 03:13:24AM -0500, Jeff King wrote:
>
>> I noticed that "git checkout $tree -- $path" will _always_ unlink and
>> write a new copy of each matching path, even if they are up-to-date with
>> the index and the content in $tree is the same.
>
> By the way, one other thing I wondered while looking at this code: when
> we checkout a working tree file, we unlink the old one and write the new
> one in-place. Is there a particular reason we do this versus writing to
> a temporary file and renaming it into place?  That would give
> simultaneous readers a more atomic view.
>
> I suspect the answer is something like: you cannot always do a rename,
> because you might have a typechange, directory becoming a file, or vice
> versa; so anyone relying on an atomic view during a checkout operation
> is already Doing It Wrong.  Handling a content-change of an existing
> path would complicate the code, so we do not bother.

Not a confirmation, but it looks like Linus did it just to make sure
he had new permissions right, in e447947 (Be much more liberal about
the file mode bits. - 2005-04-16).
-- 
Duy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-07 10:13   ` Duy Nguyen
@ 2014-11-07 16:51     ` Junio C Hamano
  2014-11-07 19:15     ` Jeff King
  1 sibling, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2014-11-07 16:51 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Jeff King, Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> On Fri, Nov 7, 2014 at 3:38 PM, Jeff King <peff@peff.net> wrote:
>> On Fri, Nov 07, 2014 at 03:13:24AM -0500, Jeff King wrote:
>>
>>> I noticed that "git checkout $tree -- $path" will _always_ unlink and
>>> write a new copy of each matching path, even if they are up-to-date with
>>> the index and the content in $tree is the same.
>>
>> By the way, one other thing I wondered while looking at this code: when
>> we checkout a working tree file, we unlink the old one and write the new
>> one in-place. Is there a particular reason we do this versus writing to
>> a temporary file and renaming it into place?  That would give
>> simultaneous readers a more atomic view.
>>
>> I suspect the answer is something like: you cannot always do a rename,
>> because you might have a typechange, directory becoming a file, or vice
>> versa; so anyone relying on an atomic view during a checkout operation
>> is already Doing It Wrong.  Handling a content-change of an existing
>> path would complicate the code, so we do not bother.
>
> Not a confirmation, but it looks like Linus did it just to make sure
> he had new permissions right, in e447947 (Be much more liberal about
> the file mode bits. - 2005-04-16).

I think you are referring to the "... to get the new one with the
right permissions" comment in that patch, but I do not think that
affects the choice between (1) unlink and write anew to the final
and (2) create a new temporary and rename.  Either way, you do not
update the existing file in-place and try to checkout the permission
bits with chmod(2).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-07  8:13 [RFC] git checkout $tree -- $path always rewrites files Jeff King
  2014-11-07  8:38 ` Jeff King
@ 2014-11-07 17:14 ` Junio C Hamano
  2014-11-07 19:17   ` Jeff King
  1 sibling, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2014-11-07 17:14 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> Is there a reason that we don't use this diff technique for checkout?

I suspect that the reasons are probably mixture of these:

 (1) the code path may descend from checkout-index and predates the
     in-core diff machinery;

 (2) in the context of checkout-index, it was more desirable to be
     able to say "I want the contents on the filesystem, even if you
     think I already have it there, as you as a new software are
     likely to be wrong and I know better"; or

 (3) it was easier to code that way ;-)

I do not see there is any reason not to do what you suggest.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-07 10:13   ` Duy Nguyen
  2014-11-07 16:51     ` Junio C Hamano
@ 2014-11-07 19:15     ` Jeff King
  1 sibling, 0 replies; 23+ messages in thread
From: Jeff King @ 2014-11-07 19:15 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On Fri, Nov 07, 2014 at 05:13:47PM +0700, Duy Nguyen wrote:

> > By the way, one other thing I wondered while looking at this code: when
> > we checkout a working tree file, we unlink the old one and write the new
> > one in-place. Is there a particular reason we do this versus writing to
> > a temporary file and renaming it into place?  That would give
> > simultaneous readers a more atomic view.
> >
> > I suspect the answer is something like: you cannot always do a rename,
> > because you might have a typechange, directory becoming a file, or vice
> > versa; so anyone relying on an atomic view during a checkout operation
> > is already Doing It Wrong.  Handling a content-change of an existing
> > path would complicate the code, so we do not bother.
> 
> Not a confirmation, but it looks like Linus did it just to make sure
> he had new permissions right, in e447947 (Be much more liberal about
> the file mode bits. - 2005-04-16).

Thanks for digging that up. I think that only gives us half the story,
though. That explains why we would unlink/open instead of relying on
just open(O_TRUNC). But I think opening a new tempfile would work the
same as the current code in that respect.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-07 17:14 ` Junio C Hamano
@ 2014-11-07 19:17   ` Jeff King
       [not found]     ` <CANiSa6hufp=80TaesNpo1CxCbwVq3LPXvYaUSbcmzPE5pj_GGw@mail.gmail.com>
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2014-11-07 19:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Nov 07, 2014 at 09:14:42AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Is there a reason that we don't use this diff technique for checkout?
> 
> I suspect that the reasons are probably mixture of these:
> 
>  (1) the code path may descend from checkout-index and predates the
>      in-core diff machinery;
> 
>  (2) in the context of checkout-index, it was more desirable to be
>      able to say "I want the contents on the filesystem, even if you
>      think I already have it there, as you as a new software are
>      likely to be wrong and I know better"; or
> 
>  (3) it was easier to code that way ;-)
> 
> I do not see there is any reason not to do what you suggest.

OK. It's not very much code (and can mostly be lifted from git-reset),
so I may take a stab at it.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
       [not found]     ` <CANiSa6hufp=80TaesNpo1CxCbwVq3LPXvYaUSbcmzPE5pj_GGw@mail.gmail.com>
@ 2014-11-08  7:10       ` Martin von Zweigbergk
       [not found]         ` <CAPc5daWdzrHr8Rdksr3HycMRQu0=Ji7h=BPYjzZj7MH6Ko0VgQ@mail.gmail.com>
  0 siblings, 1 reply; 23+ messages in thread
From: Martin von Zweigbergk @ 2014-11-08  7:10 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: git

Trying again from plain old gmail which I think does not send a
multipart content.

On Fri, Nov 7, 2014 at 11:06 PM, Martin von Zweigbergk
<martinvonz@gmail.com> wrote:
> Is this also related to "git checkout $rev ." not removing removed files?
> What you say about the difference in implementation between checkout and
> reset sounds vaguely familiar from when I looked at it some years ago.
> Curiously, I just talked to Jonathan about this over lunch yesterday. I
> think we would both be happy to get that oddity of checkout fixed. If what
> you mention here is indeed related to fixing that, it does complicate things
> with regards to backwards compatibility.
>
>
> On Fri Nov 07 2014 at 11:24:09 AM Jeff King <peff@peff.net> wrote:
>>
>> On Fri, Nov 07, 2014 at 09:14:42AM -0800, Junio C Hamano wrote:
>>
>> > Jeff King <peff@peff.net> writes:
>> >
>> > > Is there a reason that we don't use this diff technique for checkout?
>> >
>> > I suspect that the reasons are probably mixture of these:
>> >
>> >  (1) the code path may descend from checkout-index and predates the
>> >      in-core diff machinery;
>> >
>> >  (2) in the context of checkout-index, it was more desirable to be
>> >      able to say "I want the contents on the filesystem, even if you
>> >      think I already have it there, as you as a new software are
>> >      likely to be wrong and I know better"; or
>> >
>> >  (3) it was easier to code that way ;-)
>> >
>> > I do not see there is any reason not to do what you suggest.
>>
>> OK. It's not very much code (and can mostly be lifted from git-reset),
>> so I may take a stab at it.
>>
>> -Peff
>> --
>> To unsubscribe from this list: send the line "unsubscribe git" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
       [not found]         ` <CAPc5daWdzrHr8Rdksr3HycMRQu0=Ji7h=BPYjzZj7MH6Ko0VgQ@mail.gmail.com>
@ 2014-11-08  8:03           ` Martin von Zweigbergk
  2014-11-08  8:30           ` Jeff King
  1 sibling, 0 replies; 23+ messages in thread
From: Martin von Zweigbergk @ 2014-11-08  8:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Git Mailing List

I'm not sure I had seen that particular thread, but it seems like
we're all in agreement on that topic. I'm keeping my fingers crossed
that Jeff will have time to tackle that this time :-)

On Fri, Nov 7, 2014 at 11:35 PM, Junio C Hamano <gitster@pobox.com> wrote:
> I think that has direct linkage; what you have in mind I think is
> http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935
>
> What is on thread here is an implementation glitch, not semantic one.
> Checking out a "directory" as opposed to "set of paths that match the
> pathspec" was a deliberate design choice, not an implementation glitch.
>
> Pardon HTML, misspellings and grammos, typed on a tablet.
>
> On Nov 7, 2014 11:10 PM, "Martin von Zweigbergk" <martinvonz@gmail.com>
> wrote:
>>
>> Trying again from plain old gmail which I think does not send a
>> multipart content.
>>
>> On Fri, Nov 7, 2014 at 11:06 PM, Martin von Zweigbergk
>> <martinvonz@gmail.com> wrote:
>> > Is this also related to "git checkout $rev ." not removing removed
>> > files?
>> > What you say about the difference in implementation between checkout and
>> > reset sounds vaguely familiar from when I looked at it some years ago.
>> > Curiously, I just talked to Jonathan about this over lunch yesterday. I
>> > think we would both be happy to get that oddity of checkout fixed. If
>> > what
>> > you mention here is indeed related to fixing that, it does complicate
>> > things
>> > with regards to backwards compatibility.
>> >
>> >
>> > On Fri Nov 07 2014 at 11:24:09 AM Jeff King <peff@peff.net> wrote:
>> >>
>> >> On Fri, Nov 07, 2014 at 09:14:42AM -0800, Junio C Hamano wrote:
>> >>
>> >> > Jeff King <peff@peff.net> writes:
>> >> >
>> >> > > Is there a reason that we don't use this diff technique for
>> >> > > checkout?
>> >> >
>> >> > I suspect that the reasons are probably mixture of these:
>> >> >
>> >> >  (1) the code path may descend from checkout-index and predates the
>> >> >      in-core diff machinery;
>> >> >
>> >> >  (2) in the context of checkout-index, it was more desirable to be
>> >> >      able to say "I want the contents on the filesystem, even if you
>> >> >      think I already have it there, as you as a new software are
>> >> >      likely to be wrong and I know better"; or
>> >> >
>> >> >  (3) it was easier to code that way ;-)
>> >> >
>> >> > I do not see there is any reason not to do what you suggest.
>> >>
>> >> OK. It's not very much code (and can mostly be lifted from git-reset),
>> >> so I may take a stab at it.
>> >>
>> >> -Peff
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe git" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
       [not found]         ` <CAPc5daWdzrHr8Rdksr3HycMRQu0=Ji7h=BPYjzZj7MH6Ko0VgQ@mail.gmail.com>
  2014-11-08  8:03           ` Martin von Zweigbergk
@ 2014-11-08  8:30           ` Jeff King
  2014-11-08  8:45             ` Jeff King
                               ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Jeff King @ 2014-11-08  8:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin von Zweigbergk, Git Mailing List

On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote:

> I think that has direct linkage; what you have in mind I think is
> http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935

Thanks for that link.

I did spend a few hours on this topic earlier today, and got very
confused trying to figure out what the deletion behavior _should_ be,
and whether I was breaking it.  For some reason I had zero recollection
of a conversation from last year that I was obviously a major part of. I
think I am getting old. :)

The end of that thread concludes that a diff-based approach is not going
to work, because we need to update the working tree even for files not
mentioned by the diff. I do not think that is a show-stopper, though.
It just means that we need to load the new index as one step (done now
with read_tree_recursive, but ideally using diff), and then walk over
the whole resulting index applying our pathspec again (instead of
relying on CE_UPDATE flags).

This turns out not to be a big deal, because the existing code is
already doing most of that second pathspec application anyway. It does
it because read_tree_recursive is not smart enough to update the "seen"
bits for the pathspec. But now we would have another reason to do it
this way. :)

So just to be clear, the behavior we want is that:

  echo foo >some-new-path
  git add some-new-path
  git checkout HEAD -- .

will delete some-new-path (whereas the current code turns it into an
untracked file). What should:

  git checkout HEAD -- some-new-path

do in that case? With the current code, it actually barfs, complaining
that nothing matched some-new-path (because it is not part of HEAD, and
therefore we don't consider it at all), and aborts the whole operation.
I think we would want to delete some-new-path in that case, too.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-08  8:30           ` Jeff King
@ 2014-11-08  8:45             ` Jeff King
  2014-11-09 18:37               ` Junio C Hamano
  2014-11-08 16:19             ` Martin von Zweigbergk
  2014-11-09 17:21             ` Junio C Hamano
  2 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2014-11-08  8:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin von Zweigbergk, Git Mailing List

On Sat, Nov 08, 2014 at 03:30:40AM -0500, Jeff King wrote:

> So just to be clear, the behavior we want is that:
> 
>   echo foo >some-new-path
>   git add some-new-path
>   git checkout HEAD -- .
> 
> will delete some-new-path (whereas the current code turns it into an
> untracked file). What should:
> 
>   git checkout HEAD -- some-new-path
> 
> do in that case? With the current code, it actually barfs, complaining
> that nothing matched some-new-path (because it is not part of HEAD, and
> therefore we don't consider it at all), and aborts the whole operation.
> I think we would want to delete some-new-path in that case, too.

Also, t2022.3 has me very confused.

It is explicitly checking that if we have "subdir/foo" unmerged in the
index, and we "git checkout $tree -- subdir", and $tree does not mention
"foo", that we _leave_ foo in place.

That seems very counter-intuitive to me. If you asked to make "subdir"
look like $tree, then we should clobber it. That change comes from
e721c15 (checkout: avoid unnecessary match_pathspec calls, 2013-03-27),
where it is mentioned as a _bugfix_. That in turn references 0a1283b
(checkout $tree $path: do not clobber local changes in $path not in
$tree, 2011-09-30), which explicitly goes against the goal we are
talking about here. It is not "make my index and working tree look like
$tree" at all.

So now I'm doubly confused about what we want to do.

If we want to retain that behavior, I think we can still cover these
cases by marking items missing from $tree as "to remove" during the
diff/"update the index" phase, and then being more gentle with "to
remove" files (e.g., not clobbering changed worktree files unless "-f"
is given).

I am not sure that provides a sane user experience, though. Why is it OK
to clobber local changes to a file if we are replacing it with other
content, but _not_ if we are replacing it with nothing?  Either the
content we are losing is valuable or not, but it has nothing to do with
what we are replacing. And Junio argued in the thread linked elsewhere
that the point of "git checkout $tree -- $path" is to clobber what is in
$path, which I would agree with.

I think the argument made in 0a1283b is that "git checkout $tree $path"
is not "make $path like $tree", but rather "pick bits of $path out of
$tree". Which would mean this whole deletion thing we are talking about
is completely contrary to that.

So which is it?

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-08  8:30           ` Jeff King
  2014-11-08  8:45             ` Jeff King
@ 2014-11-08 16:19             ` Martin von Zweigbergk
  2014-11-09  9:42               ` Jeff King
  2014-11-09 17:21             ` Junio C Hamano
  2 siblings, 1 reply; 23+ messages in thread
From: Martin von Zweigbergk @ 2014-11-08 16:19 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git Mailing List

First of all, thanks again for spending time on this.

On Sat, Nov 8, 2014 at 12:30 AM, Jeff King <peff@peff.net> wrote:
> On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote:
>
> So just to be clear, the behavior we want is that:
>
>   echo foo >some-new-path
>   git add some-new-path
>   git checkout HEAD -- .
>
> will delete some-new-path (whereas the current code turns it into an
> untracked file).

Yes, I think that's what I would expect.

> What should:
>
>   git checkout HEAD -- some-new-path
>
> do in that case? With the current code, it actually barfs, complaining
> that nothing matched some-new-path (because it is not part of HEAD, and
> therefore we don't consider it at all), and aborts the whole operation.
> I think we would want to delete some-new-path in that case, too.

I don't think we'd want it to be deleted. I would view 'git reset
--hard' as the role model here, and that command (without paths) would
not remove the file. And applying it to a path should not change the
behavior, just restrict it to the paths, right?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-08 16:19             ` Martin von Zweigbergk
@ 2014-11-09  9:42               ` Jeff King
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2014-11-09  9:42 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: Junio C Hamano, Git Mailing List

On Sat, Nov 08, 2014 at 08:19:21AM -0800, Martin von Zweigbergk wrote:

> > What should:
> >
> >   git checkout HEAD -- some-new-path
> >
> > do in that case? With the current code, it actually barfs, complaining
> > that nothing matched some-new-path (because it is not part of HEAD, and
> > therefore we don't consider it at all), and aborts the whole operation.
> > I think we would want to delete some-new-path in that case, too.
> 
> I don't think we'd want it to be deleted. I would view 'git reset
> --hard' as the role model here, and that command (without paths) would
> not remove the file. And applying it to a path should not change the
> behavior, just restrict it to the paths, right?

Are you sure about "git reset" here? If I do:

  git init
  echo content >file && git add file && git commit -m base
  echo modified >file
  echo new >some-new-path
  git add file some-new-path
  git reset --hard

then we delete some-new-path (it is not untracked, because the index
knows about it). That makes sense to me. I.e., we treat it with the same
"preciousness" whether it is named explicitly or not.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-08  8:30           ` Jeff King
  2014-11-08  8:45             ` Jeff King
  2014-11-08 16:19             ` Martin von Zweigbergk
@ 2014-11-09 17:21             ` Junio C Hamano
  2014-11-13 18:30               ` Jeff King
  2014-11-14  5:44               ` David Aguilar
  2 siblings, 2 replies; 23+ messages in thread
From: Junio C Hamano @ 2014-11-09 17:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin von Zweigbergk, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote:
>
>> I think that has direct linkage; what you have in mind I think is
>> http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935
>
> Thanks for that link.

It was one of the items in the "git blame leftover bits" list
(websearch for that exact phrase), so I didn't have to do any
digging just for this thread ;-)

But I made a huge typo above.  s/I think/I do not think/;

The original observation you made in this thread is that when "git
checkout $tree - $pathspec", whose defintion is to "grab paths in
the $tree that match $pathspec and put them in the index, and then
overwrite the working tree paths with the contents of these paths
that are updated in the index with the contents taken from the
$tree", unnecessarily rewrites the working tree paths even when they
already had contents that were up to date.  That is what I called an
"implementation glitch".

The old thread is a different topic.  It is about changing the
semantics of the operation to "make paths in the index and in the
$tree identical, and then update the working tree paths to also
match, all with respect to the $pathspec".  This, as Martin noted,
needs careful debate on the merit and transition plan if we decide
that it is worth doing.  The "do ignore paths that are not in $tree"
is a deliberate design choice.

I'd prefer that these two to be treated separately.  That is, even
if our plan did not involve changing the semantics of the operation,
we would want to fix the implementation glitch.  Compare the $tree
with the index using $pathspec, adjust the index by adding paths
that are missing from the index that are in $tree, but not removing
the entries in the index only because they are not in $tree (note:
when the index has a path A, the $tree has a path A/B, and the
$pathspec says A, we would end up removing A to make room for A/B,
and that should be allowed---it does not fall into the "only because
the path is not in $tree".  In such a scenario, we remove A not
because $tree does not have A but because A/B that the $tree has is
what we were asked to materialize).  And after updating the index
that way, do an equivalent of "git checkout -- $pathspec".  The
entries that were the same between the $tree and the index will have
the up-to-dateness kept and will not unnecessarily rewrite an
unmodified path that way, while things that are modified with the
operation will be overwritten, I would think.

And with that machinery in place, we could start thinking about
updating the semantics.  It will be a small change to the loop that
goes over the result from diff_index() and modifying the code that
used to do a "not remove only because not in $tree" to do a "remove
if not in $tree".

> So just to be clear, the behavior we want is that:
>
>   echo foo >some-new-path
>   git add some-new-path
>   git checkout HEAD -- .
>
> will delete some-new-path (whereas the current code turns it into an
> untracked file).

With the updated semantics proposed in the old thread, yes, that is
what should happen.

>   git checkout HEAD -- some-new-path
>
> do in that case?

Likewise.  And if some-new-path were a directory, with existing path
O and new path N both in the index but only the former in HEAD, the
operation would revert some-new-path/O to that of HEAD and remove
some-new-path/N.  That is the only logical thing we could do if we
were to take the updated sematics.

That is one of the reasons why I am not 100% convinced that the
proposed updated semantics is better, even though I was fairly
positive in the old discussion and also I kept the topic in the
"leftover bits" list.  The above command is a fairly common way to
say "I started refactoring the existing path some-path/O and
sprinkled its original contents spread into new files A, B and C in
the same directory.  Now I no longer have O in the working tree, but
let me double check by grabbing it out of the state recoded in the
commit".  You expect that "git checkout HEAD -- some-path" would not
lose A, B or C, knowing "some-path" only had O.  That expectation
would even be stronger if you are used to the current semantics, but
that is something we could fix, if we decide that the proposed
updated semantics is better, with a careful transition plan.

It might be less risky if the updated semantics were to make the
paths that are originally in the index but not in $tree untracked
(as opposed to "reset --hard" emulation where they will be lost)
unless they need to be removed to make room for D/F conflict issues,
but I haven't thought it through.

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-08  8:45             ` Jeff King
@ 2014-11-09 18:37               ` Junio C Hamano
  0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2014-11-09 18:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin von Zweigbergk, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Sat, Nov 08, 2014 at 03:30:40AM -0500, Jeff King wrote:
>
>> So just to be clear, the behavior we want is that:
>> 
>>   echo foo >some-new-path
>>   git add some-new-path
>>   git checkout HEAD -- .
>> 
>> will delete some-new-path (whereas the current code turns it into an
>> untracked file). What should:
>> 
>>   git checkout HEAD -- some-new-path
>> 
>> do in that case? With the current code, it actually barfs, complaining
>> that nothing matched some-new-path (because it is not part of HEAD, and
>> therefore we don't consider it at all), and aborts the whole operation.
>> I think we would want to delete some-new-path in that case, too.
>
> Also, t2022.3 has me very confused.
>
> It is explicitly checking that if we have "subdir/foo" unmerged in the
> index, and we "git checkout $tree -- subdir", and $tree does not mention
> "foo", that we _leave_ foo in place.

The definition of how "checkout $tree -- $pathspec" should behave
logically leads to that, I think.  Grabbing everything that matches
the "subdir" pathspec from $tree, adding them to the index and
checking them out will not touch subdir/foo that does not appear in
that $tree.

With the proposed updated semantics, it would behave differently, so
it is natural that we have tests that protect the traditional
definition of how it should behave and we will have to visit them
and update their expectation if we decide that the proposed updated
semantics is what we want.

> That seems very counter-intuitive to me. If you asked to make "subdir"
> look like $tree, then we should clobber it.

Yes.  But the existing expectation is different ;-)

If the $tree has 'subdir/foo', we would want "checkout $tree --
subdir/foo" to keep working as a way for the user to declare the
resolution of the ongoing merge.  With the proposed change in
sematics, "git checkout $tree -- subdir" will become a way to say
"Everything inside subdir/ I'd resolve to the state recorded in
$tree" during such a conflicted merge, and it will lose paths not in
the $tree.

Which may be a good thing, but existing users who are used to the
traditional behaviour will find it confusing and even risky (i.e. "I
am checking OUT; never expected it to lose any path").  A counter
argument that I find sensible, of course, is "You told us to check
out subdir/, and the state recorded for subdir/ in $tree does not
have 'foo' in it.  That is the state you asked us to check out".

The argument for "checkout $tree -- subdir/foo" is essentially the
same.  The state recorded in $tree for subdir/foo is non-existence,
and that is checked out with the proposed new semantics.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-09 17:21             ` Junio C Hamano
@ 2014-11-13 18:30               ` Jeff King
  2014-11-13 19:15                 ` Junio C Hamano
  2014-11-14  5:44               ` David Aguilar
  1 sibling, 1 reply; 23+ messages in thread
From: Jeff King @ 2014-11-13 18:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin von Zweigbergk, Git Mailing List

On Sun, Nov 09, 2014 at 09:21:49AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote:
> >
> >> I think that has direct linkage; what you have in mind I think is
> >> http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935
> >
> > Thanks for that link.
> 
> It was one of the items in the "git blame leftover bits" list
> (websearch for that exact phrase), so I didn't have to do any
> digging just for this thread ;-)
> 
> But I made a huge typo above.  s/I think/I do not think/;

Oh. That might explain some of my confusion. :)

> The original observation you made in this thread is that when "git
> checkout $tree - $pathspec", whose defintion is to "grab paths in
> the $tree that match $pathspec and put them in the index, and then
> overwrite the working tree paths with the contents of these paths
> that are updated in the index with the contents taken from the
> $tree", unnecessarily rewrites the working tree paths even when they
> already had contents that were up to date.  That is what I called an
> "implementation glitch".
> 
> The old thread is a different topic.
> [...]

Right, I do agree that these things don't need to be linked. The reason
I ended up dealing with the deletion thing is that one obvious way to
implement "do not touch entries which have not changed" is by running a
diff between $tree and the index. But doing a diff means we have to
reconsider all of the code surrounding deletions (and handling of
unmerged entries in the pathspec but not in $tree). I tried a few
variants and had trouble making it work (getting caught up either in the
"make sure each pathspec matched" code, or the "treat unmerged entries
specially" behavior).

In the patch below, I ended up retaining the existing
read_tree_recursive code, and just specially handling the replacement of
index entries (which is itself sort of a diff, but it fits nicely into
the existing scheme).

> I'd prefer that these two to be treated separately.

Yeah, that makes sense after reading your emails. What I was really
unclear on was whether the handling of deletion was a bug or a design
choice, and it is the latter (if it were the former, we would not need a
transition plan :) ).

Anyway, here is the patch.

-- >8 --
Subject: checkout $tree: do not throw away unchanged index entries

When we "git checkout $tree", we pull paths from $tree into
the index, and then check the resulting entries out to the
worktree. Our method for the first step is rather
heavy-handed, though; it clobbers the entire existing index
entry, even if the content is the same. This means we lose
our stat information, leading checkout_entry to later
rewrite the entire file with identical content.

Instead, let's see if we have the identical entry already in
the index, in which case we leave it in place. That lets
checkout_entry do the right thing. Our tests cover two
interesting cases:

  1. We make sure that a file which has no changes is not
     rewritten.

  2. We make sure that we do update a file that is unchanged
     in the index (versus $tree), but has working tree
     changes. We keep the old index entry, and
     checkout_entry is able to realize that our stat
     information is out of date.

Signed-off-by: Jeff King <peff@peff.net>
---
Note that the test refreshes the index manually (because we are tweaking
the timestamp of file2). In normal use this should not be necessary
(i.e., your entries should generally be uptodate). I did wonder if
checkout should be refreshing the index itself, but it would a bunch of
extra lstats in the common case.

 builtin/checkout.c        | 31 +++++++++++++++++++++++++------
 t/t2022-checkout-paths.sh | 17 +++++++++++++++++
 2 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 5410dac..67cab4e 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -65,21 +65,40 @@ static int post_checkout_hook(struct commit *old, struct commit *new,
 static int update_some(const unsigned char *sha1, const char *base, int baselen,
 		const char *pathname, unsigned mode, int stage, void *context)
 {
-	int len;
+	struct strbuf buf = STRBUF_INIT;
 	struct cache_entry *ce;
+	int pos;
 
 	if (S_ISDIR(mode))
 		return READ_TREE_RECURSIVE;
 
-	len = baselen + strlen(pathname);
-	ce = xcalloc(1, cache_entry_size(len));
+	strbuf_add(&buf, base, baselen);
+	strbuf_addstr(&buf, pathname);
+
+	/*
+	 * If the entry is the same as the current index, we can leave the old
+	 * entry in place. Whether it is UPTODATE or not, checkout_entry will
+	 * do the right thing.
+	 */
+	pos = cache_name_pos(buf.buf, buf.len);
+	if (pos >= 0) {
+		struct cache_entry *old = active_cache[pos];
+		if (create_ce_mode(mode) == old->ce_mode &&
+		    !hashcmp(old->sha1, sha1)) {
+			old->ce_flags |= CE_UPDATE;
+			strbuf_release(&buf);
+			return 0;
+		}
+	}
+
+	ce = xcalloc(1, cache_entry_size(buf.len));
 	hashcpy(ce->sha1, sha1);
-	memcpy(ce->name, base, baselen);
-	memcpy(ce->name + baselen, pathname, len - baselen);
+	memcpy(ce->name, buf.buf, buf.len);
 	ce->ce_flags = create_ce_flags(0) | CE_UPDATE;
-	ce->ce_namelen = len;
+	ce->ce_namelen = buf.len;
 	ce->ce_mode = create_ce_mode(mode);
 	add_cache_entry(ce, ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
+	strbuf_release(&buf);
 	return 0;
 }
 
diff --git a/t/t2022-checkout-paths.sh b/t/t2022-checkout-paths.sh
index 8e3545d..f46d049 100755
--- a/t/t2022-checkout-paths.sh
+++ b/t/t2022-checkout-paths.sh
@@ -61,4 +61,21 @@ test_expect_success 'do not touch unmerged entries matching $path but not in $tr
 	test_cmp expect.next0 actual.next0
 '
 
+test_expect_success 'do not touch files that are already up-to-date' '
+	git reset --hard &&
+	echo one >file1 &&
+	echo two >file2 &&
+	git add file1 file2 &&
+	git commit -m base &&
+	echo modified >file1 &&
+	test-chmtime =1000000000 file2 &&
+	git update-index -q --refresh &&
+	git checkout HEAD -- file1 file2 &&
+	echo one >expect &&
+	test_cmp expect file1 &&
+	echo "1000000000	file2" >expect &&
+	test-chmtime -v +0 file2 >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.1.2.596.g7379948

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-13 18:30               ` Jeff King
@ 2014-11-13 19:15                 ` Junio C Hamano
  2014-11-13 19:26                   ` Jeff King
  0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2014-11-13 19:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin von Zweigbergk, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Sun, Nov 09, 2014 at 09:21:49AM -0800, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> > On Fri, Nov 07, 2014 at 11:35:59PM -0800, Junio C Hamano wrote:
>> >
>> >> I think that has direct linkage; what you have in mind I think is
>> >> http://thread.gmane.org/gmane.comp.version-control.git/234903/focus=234935
>> >
>> > Thanks for that link.
>> 
>> It was one of the items in the "git blame leftover bits" list
>> (websearch for that exact phrase), so I didn't have to do any
>> digging just for this thread ;-)
>> 
>> But I made a huge typo above.  s/I think/I do not think/;
>
> Oh. That might explain some of my confusion. :)

Yeah, tells me to never type on a tablet X-<.

>> I'd prefer that these two to be treated separately.
>
> Yeah, that makes sense after reading your emails. What I was really
> unclear on was whether the handling of deletion was a bug or a design
> choice, and it is the latter (if it were the former, we would not need a
> transition plan :) ).

Yeah, I think we agree to refrain from saying if that design choice
was a good one or bad one at least for now.

> Subject: checkout $tree: do not throw away unchanged index entries
>
> When we "git checkout $tree", we pull paths from $tree into
> the index, and then check the resulting entries out to the
> worktree. Our method for the first step is rather
> heavy-handed, though; it clobbers the entire existing index
> entry, even if the content is the same. This means we lose
> our stat information, leading checkout_entry to later
> rewrite the entire file with identical content.
>
> Instead, let's see if we have the identical entry already in
> the index, in which case we leave it in place. That lets
> checkout_entry do the right thing. Our tests cover two
> interesting cases:
>
>   1. We make sure that a file which has no changes is not
>      rewritten.
>
>   2. We make sure that we do update a file that is unchanged
>      in the index (versus $tree), but has working tree
>      changes. We keep the old index entry, and
>      checkout_entry is able to realize that our stat
>      information is out of date.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Note that the test refreshes the index manually (because we are tweaking
> the timestamp of file2). In normal use this should not be necessary
> (i.e., your entries should generally be uptodate). I did wonder if
> checkout should be refreshing the index itself, but it would a bunch of
> extra lstats in the common case.
>
>  builtin/checkout.c        | 31 +++++++++++++++++++++++++------
>  t/t2022-checkout-paths.sh | 17 +++++++++++++++++
>  2 files changed, 42 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 5410dac..67cab4e 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -65,21 +65,40 @@ static int post_checkout_hook(struct commit *old, struct commit *new,
>  static int update_some(const unsigned char *sha1, const char *base, int baselen,
>  		const char *pathname, unsigned mode, int stage, void *context)
>  {
> ...
>  }

Makes sense, including the use of strbuf (otherwise you would
allocate ce and then discard when it turns out that it is not
needed, which is probably with the same allocation pressure, but
looks uglier).

> diff --git a/t/t2022-checkout-paths.sh b/t/t2022-checkout-paths.sh
> index 8e3545d..f46d049 100755
> --- a/t/t2022-checkout-paths.sh
> +++ b/t/t2022-checkout-paths.sh
> @@ -61,4 +61,21 @@ test_expect_success 'do not touch unmerged entries matching $path but not in $tr
>  	test_cmp expect.next0 actual.next0
>  '
>  
> +test_expect_success 'do not touch files that are already up-to-date' '
> +	git reset --hard &&
> +	echo one >file1 &&
> +	echo two >file2 &&
> +	git add file1 file2 &&
> +	git commit -m base &&
> +	echo modified >file1 &&
> +	test-chmtime =1000000000 file2 &&

Is the idea behind the hardcoded timestamp that this is sufficiently
old (Sep 2001) that we will not get in trouble comparing with the
real timestamp we get from the filesystem (which will definitely newer
than that anyway) no matter when we run this test (unless you have a
time-machine, that is)?

> +	git update-index -q --refresh &&
> +	git checkout HEAD -- file1 file2 &&
> +	echo one >expect &&
> +	test_cmp expect file1 &&
> +	echo "1000000000	file2" >expect &&
> +	test-chmtime -v +0 file2 >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-13 19:15                 ` Junio C Hamano
@ 2014-11-13 19:26                   ` Jeff King
  2014-11-13 20:03                     ` Jeff King
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2014-11-13 19:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin von Zweigbergk, Git Mailing List

On Thu, Nov 13, 2014 at 11:15:27AM -0800, Junio C Hamano wrote:

> > diff --git a/builtin/checkout.c b/builtin/checkout.c
> > index 5410dac..67cab4e 100644
> > --- a/builtin/checkout.c
> > +++ b/builtin/checkout.c
> > @@ -65,21 +65,40 @@ static int post_checkout_hook(struct commit *old, struct commit *new,
> >  static int update_some(const unsigned char *sha1, const char *base, int baselen,
> >  		const char *pathname, unsigned mode, int stage, void *context)
> >  {
> > ...
> >  }
> 
> Makes sense, including the use of strbuf (otherwise you would
> allocate ce and then discard when it turns out that it is not
> needed, which is probably with the same allocation pressure, but
> looks uglier).

Exactly. Constructing it in ce->name does save you an allocation/memcpy
in the case that we actually use the new entry, but I thought it would
look weirder. It probably doesn't matter much either way, so I tried to
write the most obvious thing.

(I also experimented with using make_cache_entry at one point, which
requires the strbuf; some of my thinking on what looks reasonable may be
left over from that approach).

> > +test_expect_success 'do not touch files that are already up-to-date' '
> > +	git reset --hard &&
> > +	echo one >file1 &&
> > +	echo two >file2 &&
> > +	git add file1 file2 &&
> > +	git commit -m base &&
> > +	echo modified >file1 &&
> > +	test-chmtime =1000000000 file2 &&
> 
> Is the idea behind the hardcoded timestamp that this is sufficiently
> old (Sep 2001) that we will not get in trouble comparing with the
> real timestamp we get from the filesystem (which will definitely newer
> than that anyway) no matter when we run this test (unless you have a
> time-machine, that is)?

I didn't actually calculate what the timestamp was. The important thing
is that it is not the timestamp that your system would give to the file
if git-checkout opened and rewrote it. :)

I initially used "123", but was worried that would cause weird
portability problems on systems. So I opted for something closer to
"normal", but in the past. I think it is fine (modulo time machines),
but I'd be happy to put in some more obvious sentinel, too.

And the worst case if you did have a time machine is that we might
accidentally declare a buggy git to be correct (racily!). I can live
with that, but I guess you could use a relative value (like "-10000")
instead of a fixed sentinel (but then you'd have to record it for the
"expect" check).

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-13 19:26                   ` Jeff King
@ 2014-11-13 20:03                     ` Jeff King
  2014-11-13 21:18                       ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2014-11-13 20:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin von Zweigbergk, Git Mailing List

On Thu, Nov 13, 2014 at 02:26:55PM -0500, Jeff King wrote:

> > Makes sense, including the use of strbuf (otherwise you would
> > allocate ce and then discard when it turns out that it is not
> > needed, which is probably with the same allocation pressure, but
> > looks uglier).
> 
> Exactly. Constructing it in ce->name does save you an allocation/memcpy
> in the case that we actually use the new entry, but I thought it would
> look weirder. It probably doesn't matter much either way, so I tried to
> write the most obvious thing.

Actually, it is not that bad:

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 5410dac..5a78758 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -67,6 +67,7 @@ static int update_some(const unsigned char *sha1, const char *base, int baselen,
 {
 	int len;
 	struct cache_entry *ce;
+	int pos;
 
 	if (S_ISDIR(mode))
 		return READ_TREE_RECURSIVE;
@@ -79,6 +80,23 @@ static int update_some(const unsigned char *sha1, const char *base, int baselen,
 	ce->ce_flags = create_ce_flags(0) | CE_UPDATE;
 	ce->ce_namelen = len;
 	ce->ce_mode = create_ce_mode(mode);
+
+	/*
+	 * If the entry is the same as the current index, we can leave the old
+	 * entry in place. Whether it is UPTODATE or not, checkout_entry will
+	 * do the right thing.
+	 */
+	pos = cache_name_pos(ce->name, ce->ce_namelen);
+	if (pos >= 0) {
+		struct cache_entry *old = active_cache[pos];
+		if (ce->ce_mode == old->ce_mode &&
+		    !hashcmp(ce->sha1, old->sha1)) {
+			old->ce_flags |= CE_UPDATE;
+			free(ce);
+			return 0;
+		}
+	}
+
 	add_cache_entry(ce, ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
 	return 0;
 }

and in some ways more readable, as you form the whole thing, and then as
the final step either add it, or realize that what is there is fine (I'd
almost wonder if it could be a flag to add_cache_entry).

-Peff

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-13 20:03                     ` Jeff King
@ 2014-11-13 21:18                       ` Junio C Hamano
  2014-11-13 21:37                         ` Jeff King
  0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2014-11-13 21:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Martin von Zweigbergk, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Thu, Nov 13, 2014 at 02:26:55PM -0500, Jeff King wrote:
>
>> > Makes sense, including the use of strbuf (otherwise you would
>> > allocate ce and then discard when it turns out that it is not
>> > needed, which is probably with the same allocation pressure, but
>> > looks uglier).
>> 
>> Exactly. Constructing it in ce->name does save you an allocation/memcpy
>> in the case that we actually use the new entry, but I thought it would
>> look weirder. It probably doesn't matter much either way, so I tried to
>> write the most obvious thing.
>
> Actually, it is not that bad:

Yeah, actually it does look better; want me to squash it into the
patch before queuing?

>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 5410dac..5a78758 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -67,6 +67,7 @@ static int update_some(const unsigned char *sha1, const char *base, int baselen,
>  {
>  	int len;
>  	struct cache_entry *ce;
> +	int pos;
>  
>  	if (S_ISDIR(mode))
>  		return READ_TREE_RECURSIVE;
> @@ -79,6 +80,23 @@ static int update_some(const unsigned char *sha1, const char *base, int baselen,
>  	ce->ce_flags = create_ce_flags(0) | CE_UPDATE;
>  	ce->ce_namelen = len;
>  	ce->ce_mode = create_ce_mode(mode);
> +
> +	/*
> +	 * If the entry is the same as the current index, we can leave the old
> +	 * entry in place. Whether it is UPTODATE or not, checkout_entry will
> +	 * do the right thing.
> +	 */
> +	pos = cache_name_pos(ce->name, ce->ce_namelen);
> +	if (pos >= 0) {
> +		struct cache_entry *old = active_cache[pos];
> +		if (ce->ce_mode == old->ce_mode &&
> +		    !hashcmp(ce->sha1, old->sha1)) {
> +			old->ce_flags |= CE_UPDATE;
> +			free(ce);
> +			return 0;
> +		}
> +	}
> +
>  	add_cache_entry(ce, ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
>  	return 0;
>  }
>
> and in some ways more readable, as you form the whole thing, and then as
> the final step either add it, or realize that what is there is fine (I'd
> almost wonder if it could be a flag to add_cache_entry).
>
> -Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-13 21:18                       ` Junio C Hamano
@ 2014-11-13 21:37                         ` Jeff King
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2014-11-13 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin von Zweigbergk, Git Mailing List

On Thu, Nov 13, 2014 at 01:18:43PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Thu, Nov 13, 2014 at 02:26:55PM -0500, Jeff King wrote:
> >
> >> > Makes sense, including the use of strbuf (otherwise you would
> >> > allocate ce and then discard when it turns out that it is not
> >> > needed, which is probably with the same allocation pressure, but
> >> > looks uglier).
> >> 
> >> Exactly. Constructing it in ce->name does save you an allocation/memcpy
> >> in the case that we actually use the new entry, but I thought it would
> >> look weirder. It probably doesn't matter much either way, so I tried to
> >> write the most obvious thing.
> >
> > Actually, it is not that bad:
> 
> Yeah, actually it does look better; want me to squash it into the
> patch before queuing?

Yeah, if you like it, too, then let's go with it. Thanks.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-09 17:21             ` Junio C Hamano
  2014-11-13 18:30               ` Jeff King
@ 2014-11-14  5:44               ` David Aguilar
  2014-11-14 19:27                 ` Junio C Hamano
  1 sibling, 1 reply; 23+ messages in thread
From: David Aguilar @ 2014-11-14  5:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Martin von Zweigbergk, Git Mailing List

On Sun, Nov 09, 2014 at 09:21:49AM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > So just to be clear, the behavior we want is that:
> >
> >   echo foo >some-new-path
> >   git add some-new-path
> >   git checkout HEAD -- .
> >
> > will delete some-new-path (whereas the current code turns it into an
> > untracked file).
> 
> With the updated semantics proposed in the old thread, yes, that is
> what should happen.
> 
> >   git checkout HEAD -- some-new-path
> >
> > do in that case?
> 
> Likewise.  And if some-new-path were a directory, with existing path
> O and new path N both in the index but only the former in HEAD, the
> operation would revert some-new-path/O to that of HEAD and remove
> some-new-path/N.  That is the only logical thing we could do if we
> were to take the updated sematics.
> 
> That is one of the reasons why I am not 100% convinced that the
> proposed updated semantics is better, even though I was fairly
> positive in the old discussion and also I kept the topic in the
> "leftover bits" list.  The above command is a fairly common way to
> say "I started refactoring the existing path some-path/O and
> sprinkled its original contents spread into new files A, B and C in
> the same directory.  Now I no longer have O in the working tree, but
> let me double check by grabbing it out of the state recoded in the
> commit".  You expect that "git checkout HEAD -- some-path" would not
> lose A, B or C, knowing "some-path" only had O.  That expectation
> would even be stronger if you are used to the current semantics, but
> that is something we could fix, if we decide that the proposed
> updated semantics is better, with a careful transition plan.
> 
> It might be less risky if the updated semantics were to make the
> paths that are originally in the index but not in $tree untracked
> (as opposed to "reset --hard" emulation where they will be lost)
> unless they need to be removed to make room for D/F conflict issues,
> but I haven't thought it through.


Git has always been really careful to not lose data.

One way to avoid the problem of changing existing semantics is
to make the new semantics accessible behind a flag, e.g.
"git checkout --hard HEAD -- some-new-path".
-- 
David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] git checkout $tree -- $path always rewrites files
  2014-11-14  5:44               ` David Aguilar
@ 2014-11-14 19:27                 ` Junio C Hamano
  0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2014-11-14 19:27 UTC (permalink / raw)
  To: David Aguilar; +Cc: Jeff King, Martin von Zweigbergk, Git Mailing List

David Aguilar <davvid@gmail.com> writes:

> On Sun, Nov 09, 2014 at 09:21:49AM -0800, Junio C Hamano wrote:
>> 
>> It might be less risky if the updated semantics were to make the
>> paths that are originally in the index but not in $tree untracked
>> (as opposed to "reset --hard" emulation where they will be lost)
>> unless they need to be removed to make room for D/F conflict issues,
>> but I haven't thought it through.
>
> Git has always been really careful to not lose data.
>
> One way to avoid the problem of changing existing semantics is
> to make the new semantics accessible behind a flag, e.g.
> "git checkout --hard HEAD -- some-new-path".

Yup, but you seem to be behind by a few exchanges, as we tentatively
decided that we won't talk about changing the semantics and concentrate
on fixing the implementation glitches only at least for now ;-)

I find that "--hard" is not a very good name for the new mode.
There will be different kinds of "more than what we usually do"
modes of operations discovered over time in the coming years, and it
is better to be more specific to denote "in what way we are doing it
harder" (I think the difference the proposed new mode has is to also
checkout absense of the paths).

But in this particular case, making the paths that are absent in $tree
we are checking out of into untracked paths (instead of removing) is
a right balance of safety---it is similar to "git reset HEAD" (no
"--hard") after adding a new path which leaves the file in the
working tree.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-11-14 19:27 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-07  8:13 [RFC] git checkout $tree -- $path always rewrites files Jeff King
2014-11-07  8:38 ` Jeff King
2014-11-07 10:13   ` Duy Nguyen
2014-11-07 16:51     ` Junio C Hamano
2014-11-07 19:15     ` Jeff King
2014-11-07 17:14 ` Junio C Hamano
2014-11-07 19:17   ` Jeff King
     [not found]     ` <CANiSa6hufp=80TaesNpo1CxCbwVq3LPXvYaUSbcmzPE5pj_GGw@mail.gmail.com>
2014-11-08  7:10       ` Martin von Zweigbergk
     [not found]         ` <CAPc5daWdzrHr8Rdksr3HycMRQu0=Ji7h=BPYjzZj7MH6Ko0VgQ@mail.gmail.com>
2014-11-08  8:03           ` Martin von Zweigbergk
2014-11-08  8:30           ` Jeff King
2014-11-08  8:45             ` Jeff King
2014-11-09 18:37               ` Junio C Hamano
2014-11-08 16:19             ` Martin von Zweigbergk
2014-11-09  9:42               ` Jeff King
2014-11-09 17:21             ` Junio C Hamano
2014-11-13 18:30               ` Jeff King
2014-11-13 19:15                 ` Junio C Hamano
2014-11-13 19:26                   ` Jeff King
2014-11-13 20:03                     ` Jeff King
2014-11-13 21:18                       ` Junio C Hamano
2014-11-13 21:37                         ` Jeff King
2014-11-14  5:44               ` David Aguilar
2014-11-14 19:27                 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).