git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Neal Kreitzinger <nkreitzinger@gmail.com>
Cc: Neal Kreitzinger <neal@rsss.com>, git@vger.kernel.org
Subject: Re: suggestion: git status = restored
Date: Tue, 29 Mar 2011 17:28:44 -0400	[thread overview]
Message-ID: <20110329212844.GA23510@sigill.intra.peff.net> (raw)
In-Reply-To: <4D92179D.6050102@gmail.com>

On Tue, Mar 29, 2011 at 12:32:13PM -0500, Neal Kreitzinger wrote:

> I see your point about the current worktree/index/HEAD.  I'm not a
> git developer, but my idea is based on the concept that the sha-1 of
> the content already exists in the object store regardless of its
> path(s). I'm talking about identical blob sha-1's, not "similar"
> content.  It seems like the loose object directory would be easy
> enough the check for duplicate blob sha-1's, but the the pack would
> probably be more difficult (i'm not sure how you could do that).  If
> this capability doesn't fit well into fast default behavior, maybe
> there could be an option to --find-restores-harder.

Ah, I see. Yes, that is extremely cheap to calculate for loose or packed
objects (see has_sha1_file in sha1_file.c). But by the time you run
status, it is too late. When you "git add" the file, it will write the
sha1 into the object db. So by definition, if you are tracking a file
for commit, it will exist in the object db. You could check the
timestamp on the object file to see if it has been around "for a while",
but that is very hack-ish and may or may not return useful results.

> That being said, I see how it may not be feasible for git-status to
> do that extra work.  Git-status runs against "what you just did" so
> hopefully I should know in my mind that I just checked something out
> to restore it.  However, for analyzing history it would be nice for
> git-log or git-diff to be able to do that extra work of finding
> restores when asked.
> 
> In our workflow it would be useful because we have a utility
> directory of mostly obsolete programs that needs to be deleted to
> eliminate noise, but we're sure some of them will get restored once
> we realize they're still needed.  An interrogation command with
> --name-status --find-restores-harder would give an accurate picture
> of what was really added (new content) and what was simply restored
> (old content revived).

I think you just want:

  git log -1 -- "$file"

to see if any commits had that path previously. Or if you really care
about finding the same content somewhere in history at any path, you can
look for the blobs with something like:

  git rev-list HEAD |
  git diff-tree -r -m --stdin |
  perl -e '
    # Make an index of blob sha1s pointing back to the file
    # they name.
    foreach my $file (@ARGV) {
      my $sha1 = `git hash-object $file`;
      chomp $sha1;
      $files{$sha1}->{file} = $file;
    }

    # Now look at the traversal history, noting the first time
    # we hit each blob, and remember its commit.
    while (<STDIN>) {
      if (/^[0-9a-f]{40}$/) {
        $commit = $&;
      }
      else {
        while (/[0-9a-f]{40}/g) {
          next unless exists $files{$&};
          next if exists $files{$&}->{commit};
          $files{$&}->{commit} = $commit;
        }
      }
    }

    # And then report the result, which is the most recent commit
    # that blob was found in, either being deleted, added, or modified.
    foreach my $v (sort { $a->{file} cmp $b->{file} } values(%files)) {
      if ($v->{commit}) {
        print "$v->{file} $v->{commit}\n";
      }
      else {
        print "$v->{file} was never mentioned\n";
      }
    }
  ' `git diff-index HEAD --name-only --diff-filter=A`

-Peff

      parent reply	other threads:[~2011-03-29 21:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-25 17:59 suggestion: git status = restored Neal Kreitzinger
2011-03-29 14:58 ` Jeff King
2011-03-29 17:32   ` Neal Kreitzinger
2011-03-29 18:51     ` Junio C Hamano
2011-03-29 18:56     ` Matthieu Moy
2011-03-29 21:28     ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110329212844.GA23510@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=neal@rsss.com \
    --cc=nkreitzinger@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).