suggestion: git status = restored

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* suggestion: git status = restored
@ 2011-03-25 17:59 Neal Kreitzinger
  2011-03-29 14:58 ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Neal Kreitzinger @ 2011-03-25 17:59 UTC (permalink / raw)
  To: git

We deleted (git-rm) a file from the repo by mistake.  Several commits later 
we restored it (git-checkout, git-commit).  Git status shows "added" for 
this file.  IMHO, it seems like git status should be "restored" or 
"unremoved", etc, for this file.  Git detects renames and copies so it seems 
like it could detect restores.

v/r,
Neal 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: suggestion: git status = restored
  2011-03-25 17:59 suggestion: git status = restored Neal Kreitzinger
@ 2011-03-29 14:58 ` Jeff King
  2011-03-29 17:32   ` Neal Kreitzinger
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2011-03-29 14:58 UTC (permalink / raw)
  To: Neal Kreitzinger; +Cc: git

On Fri, Mar 25, 2011 at 12:59:34PM -0500, Neal Kreitzinger wrote:

> We deleted (git-rm) a file from the repo by mistake.  Several commits later 
> we restored it (git-checkout, git-commit).  Git status shows "added" for 
> this file.  IMHO, it seems like git status should be "restored" or 
> "unremoved", etc, for this file.  Git detects renames and copies so it seems 
> like it could detect restores.

I am mildly negative on the idea, though I think it is mostly just
because I would not find that information useful at all.

But what gives me pause is that it is adding a totally new dimension to
git-status. Currently status is about three things:

  1. What's in your index, and how does it differ from what's in HEAD.

  2. What's in your working tree, and how does it differ from what's in
     your index.

  3. What untracked files are in your working tree.

So it is only about HEAD, the index, and the working tree, and we only
have to look at those things. We detect copies and renames, yes, but
only in the diffs between those points.

But what you are proposing requires looking backwards in history to see
if we used to have something like the thing that has been added. So that
introduces a few questions:

  1. What are we claiming to have "used to have"? Some arbitrary content
     at the same path, or similar content at the same path, or similar
     content at any path?

  2. Which history do we look at? Do we start traversing backwards from
     HEAD? If so, how far back do we go (you probably don't want to go
     to the roots, which is expensive)? Is it useful to see similar
     files on other branches (e.g., instead of "you are adding foo,
     which is being resurrected from 'HEAD~20: remove foo'", you would
     find out that "you are adding foo, which has also been added on
     branch 'topic'").

  3. How expensive is the test going to end up? For generating a commit
     template or running "git status", it's probably OK. But keep in
     mind also that people run "git status --porcelain" to generate
     their shell prompt. So it needs to either be really fast, or it
     needs to be easy to turn it off in some cases.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: suggestion: git status = restored
  2011-03-29 14:58 ` Jeff King
@ 2011-03-29 17:32   ` Neal Kreitzinger
  2011-03-29 18:51     ` Junio C Hamano
                       ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Neal Kreitzinger @ 2011-03-29 17:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Neal Kreitzinger, git

On 3/29/2011 9:58 AM, Jeff King wrote:
> On Fri, Mar 25, 2011 at 12:59:34PM -0500, Neal Kreitzinger wrote:
>
>> We deleted (git-rm) a file from the repo by mistake.  Several commits later
>> we restored it (git-checkout, git-commit).  Git status shows "added" for
>> this file.  IMHO, it seems like git status should be "restored" or
>> "unremoved", etc, for this file.  Git detects renames and copies so it seems
>> like it could detect restores.
>
> I am mildly negative on the idea, though I think it is mostly just
> because I would not find that information useful at all.
>
> But what gives me pause is that it is adding a totally new dimension to
> git-status. Currently status is about three things:
>
>    1. What's in your index, and how does it differ from what's in HEAD.
>
>    2. What's in your working tree, and how does it differ from what's in
>       your index.
>
>    3. What untracked files are in your working tree.
>
> So it is only about HEAD, the index, and the working tree, and we only
> have to look at those things. We detect copies and renames, yes, but
> only in the diffs between those points.
>
> But what you are proposing requires looking backwards in history to see
> if we used to have something like the thing that has been added. So that
> introduces a few questions:
>
>    1. What are we claiming to have "used to have"? Some arbitrary content
>       at the same path, or similar content at the same path, or similar
>       content at any path?
>
>    2. Which history do we look at? Do we start traversing backwards from
>       HEAD? If so, how far back do we go (you probably don't want to go
>       to the roots, which is expensive)? Is it useful to see similar
>       files on other branches (e.g., instead of "you are adding foo,
>       which is being resurrected from 'HEAD~20: remove foo'", you would
>       find out that "you are adding foo, which has also been added on
>       branch 'topic'").
>
>    3. How expensive is the test going to end up? For generating a commit
>       template or running "git status", it's probably OK. But keep in
>       mind also that people run "git status --porcelain" to generate
>       their shell prompt. So it needs to either be really fast, or it
>       needs to be easy to turn it off in some cases.
>
> -Peff

I see your point about the current worktree/index/HEAD.  I'm not a git 
developer, but my idea is based on the concept that the sha-1 of the 
content already exists in the object store regardless of its path(s). 
I'm talking about identical blob sha-1's, not "similar" content.  It 
seems like the loose object directory would be easy enough the check for 
duplicate blob sha-1's, but the the pack would probably be more 
difficult (i'm not sure how you could do that).  If this capability 
doesn't fit well into fast default behavior, maybe there could be an 
option to --find-restores-harder.

That being said, I see how it may not be feasible for git-status to do 
that extra work.  Git-status runs against "what you just did" so 
hopefully I should know in my mind that I just checked something out to 
restore it.  However, for analyzing history it would be nice for git-log 
or git-diff to be able to do that extra work of finding restores when asked.

In our workflow it would be useful because we have a utility directory 
of mostly obsolete programs that needs to be deleted to eliminate noise, 
but we're sure some of them will get restored once we realize they're 
still needed.  An interrogation command with --name-status 
--find-restores-harder would give an accurate picture of what was really 
added (new content) and what was simply restored (old content revived).

v/r,
neal

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: suggestion: git status = restored
  2011-03-29 17:32   ` Neal Kreitzinger
@ 2011-03-29 18:51     ` Junio C Hamano
  2011-03-29 18:56     ` Matthieu Moy
  2011-03-29 21:28     ` Jeff King
  2 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2011-03-29 18:51 UTC (permalink / raw)
  To: Neal Kreitzinger; +Cc: Jeff King, Neal Kreitzinger, git

Neal Kreitzinger <nkreitzinger@gmail.com> writes:

> I see your point about the current worktree/index/HEAD.  I'm not a git
> developer, but my idea is based on the concept that the sha-1 of the
> content already exists in the object store regardless of its
> path(s). I'm talking about identical blob sha-1's, not "similar"
> content.

One thing you seem to be missing is that you would need to prove that a
commit that had that blob existed in the ancestor of the commit you are
standing on in order to call that restore.  You cannot restore something
that you didn't lose, and if you never had it in your history, there is no
way you lost it in the first place.  And that means you have to run around
in the history potentially digging down to the root.

Also a file that happens to have the same content is not necessarily
"restore".  If you are using a boilerplate to start a new file in a
verbose language, you may "git add" the initial state of such a file
before you start adding your own lines (perhaps adding a real method
implementation to a class), and then run another "git add" to record your
changes.  It wouldn't be surprising if such an initial snapshot for
different paths were identical.

A more trivial example would be a .gitignore file that has '*.o'; that can
appear in src/ and then in lib/ but the project may not want to have it at
the toplevel of the source tree.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: suggestion: git status = restored
  2011-03-29 17:32   ` Neal Kreitzinger
  2011-03-29 18:51     ` Junio C Hamano
@ 2011-03-29 18:56     ` Matthieu Moy
  2011-03-29 21:28     ` Jeff King
  2 siblings, 0 replies; 6+ messages in thread
From: Matthieu Moy @ 2011-03-29 18:56 UTC (permalink / raw)
  To: Neal Kreitzinger; +Cc: git, Neal Kreitzinger

Neal Kreitzinger <nkreitzinger@gmail.com> writes:

> I see your point about the current worktree/index/HEAD.  I'm not a git
> developer, but my idea is based on the concept that the sha-1 of the
> content already exists in the object store regardless of its path(s).

That wouldn't work: the blob is added to the object store at the time of
"git add", so at the time of "git status", it has to exist in the object
store, whether it's "restored" or "totally new".

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: suggestion: git status = restored
  2011-03-29 17:32   ` Neal Kreitzinger
  2011-03-29 18:51     ` Junio C Hamano
  2011-03-29 18:56     ` Matthieu Moy
@ 2011-03-29 21:28     ` Jeff King
  2 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2011-03-29 21:28 UTC (permalink / raw)
  To: Neal Kreitzinger; +Cc: Neal Kreitzinger, git

On Tue, Mar 29, 2011 at 12:32:13PM -0500, Neal Kreitzinger wrote:

> I see your point about the current worktree/index/HEAD.  I'm not a
> git developer, but my idea is based on the concept that the sha-1 of
> the content already exists in the object store regardless of its
> path(s). I'm talking about identical blob sha-1's, not "similar"
> content.  It seems like the loose object directory would be easy
> enough the check for duplicate blob sha-1's, but the the pack would
> probably be more difficult (i'm not sure how you could do that).  If
> this capability doesn't fit well into fast default behavior, maybe
> there could be an option to --find-restores-harder.

Ah, I see. Yes, that is extremely cheap to calculate for loose or packed
objects (see has_sha1_file in sha1_file.c). But by the time you run
status, it is too late. When you "git add" the file, it will write the
sha1 into the object db. So by definition, if you are tracking a file
for commit, it will exist in the object db. You could check the
timestamp on the object file to see if it has been around "for a while",
but that is very hack-ish and may or may not return useful results.

> That being said, I see how it may not be feasible for git-status to
> do that extra work.  Git-status runs against "what you just did" so
> hopefully I should know in my mind that I just checked something out
> to restore it.  However, for analyzing history it would be nice for
> git-log or git-diff to be able to do that extra work of finding
> restores when asked.
> 
> In our workflow it would be useful because we have a utility
> directory of mostly obsolete programs that needs to be deleted to
> eliminate noise, but we're sure some of them will get restored once
> we realize they're still needed.  An interrogation command with
> --name-status --find-restores-harder would give an accurate picture
> of what was really added (new content) and what was simply restored
> (old content revived).

I think you just want:

  git log -1 -- "$file"

to see if any commits had that path previously. Or if you really care
about finding the same content somewhere in history at any path, you can
look for the blobs with something like:

  git rev-list HEAD |
  git diff-tree -r -m --stdin |
  perl -e '
    # Make an index of blob sha1s pointing back to the file
    # they name.
    foreach my $file (@ARGV) {
      my $sha1 = `git hash-object $file`;
      chomp $sha1;
      $files{$sha1}->{file} = $file;
    }

    # Now look at the traversal history, noting the first time
    # we hit each blob, and remember its commit.
    while (<STDIN>) {
      if (/^[0-9a-f]{40}$/) {
        $commit = $&;
      }
      else {
        while (/[0-9a-f]{40}/g) {
          next unless exists $files{$&};
          next if exists $files{$&}->{commit};
          $files{$&}->{commit} = $commit;
        }
      }
    }

    # And then report the result, which is the most recent commit
    # that blob was found in, either being deleted, added, or modified.
    foreach my $v (sort { $a->{file} cmp $b->{file} } values(%files)) {
      if ($v->{commit}) {
        print "$v->{file} $v->{commit}\n";
      }
      else {
        print "$v->{file} was never mentioned\n";
      }
    }
  ' `git diff-index HEAD --name-only --diff-filter=A`

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-03-29 21:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-25 17:59 suggestion: git status = restored Neal Kreitzinger
2011-03-29 14:58 ` Jeff King
2011-03-29 17:32   ` Neal Kreitzinger
2011-03-29 18:51     ` Junio C Hamano
2011-03-29 18:56     ` Matthieu Moy
2011-03-29 21:28     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).