All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <junkio@cox.net>
To: git@vger.kernel.org
Subject: Re: updated design for the diff-raw format.
Date: Sat, 21 May 2005 16:16:00 -0700	[thread overview]
Message-ID: <7vr7g0dvbj.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <7vwtpsdvgm.fsf@assigned-by-dhcp.cox.net> (Junio C. Hamano's message of "Sat, 21 May 2005 16:12:57 -0700")

(first of the replayed exchange)

To: Linus Torvalds <torvalds@osdl.org>
Subject: Re: [PATCH 3/3] Diff overhaul, adding the other half of copy
 detection.
From: Junio C Hamano <junkio@cox.net>
Date: Sat, 21 May 2005 10:56:06 -0700
Message-ID: <7v4qcwihu1.fsf@assigned-by-dhcp.cox.net>

GIT_DIFF_OPTS=--unified=0 is good to me as well (GNU diffutils
2.8.1).

Now I think I am done with diff, except one thing.  And this is
quite an incompatible change so I do not know how well it would
work in practice.  I am not even advocating this.  It is more
like me thinking aloud.

The diff-raw format we have been dealing with (sorry about '\t'
vs ' ' gotcha again) is internally enhanced by diff-core.  It
first introduces entries for unmodified paths; '*' entries that
has the same mode/sha1 in from->to pair are such entries, and
that is what the change in the [PATCH 3/3] is about.

    *100644->100644 blob 233a250...->66818b4... file0
    *100755->100755 blob fc77389...->7b72d3d... file1
    +100644 blob 233a250... file2
    -100755 blob fc77389... file3
    *100644->100644 blob 233a250...->233a250... file4

Then diff-core internally extends the format to make things all
look like this ('*' and '-' are gone and each record acquires
the second path).

    100644->100644 233a250...->66818b4... file0 file0
    100755->100755 fc77389...->7b72d3d... file1 file1
    ______->100644 _______...->233a250... file2 file2
    100755->______ fc77389...->_______... file3 file3
    100644->100644 233a250...->233a250... file4 file4

Internally "______" above are represented with a separate flag
(file_valid), and denotes the absense of either src or dst.

The diff-core is all about manipulating this type of list and
changing one such list into a different list.

For example, rename-edit of file3 into file2 is detected by
diffcore-rename module and these entries:

    ______->100644 _______...->233a250... file2 file2
    100755->______ fc77389...->_______... file3 file3

become:

    100755->100644 fc77389...->233a250... file3 file2

What the diffcore-pickaxe does can also be explained clearly
with this model.  It takes such a list and works as a "grep".

Once we start to think of it this way, it becomes quite tempting
to change the diff-raw format to actually match the above
concept.  Meaning, (1) drop the operation letter +/-/*
(inferrable by looking at the both sides of ->); (2) drop
blob/tree (inferrable it from mode); (3) give two paths (usually
they are the same paths); (4) and perhaps replace '->' with the
same column separator.  Like this:

    100644 100644 233a250... 66818b4... file0 file0
    100755 100755 fc77389... 7b72d3d... file1 file1
    ______ 100644 _______... 233a250... file2 file2
    100755 ______ fc77389... _______... file3 file3
    100644 100644 233a250... 233a250... file4 file4

Again, I am not even advocating this.  It is more like me
still thinking aloud.







  reply	other threads:[~2005-05-21 23:14 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-21 23:12 updated design for the diff-raw format Junio C Hamano
2005-05-21 23:16 ` Junio C Hamano [this message]
2005-05-21 23:17 ` Junio C Hamano
2005-05-21 23:18 ` Junio C Hamano
2005-05-21 23:19 ` Junio C Hamano
2005-05-22  2:40 ` [PATCH] Prepare diffcore interface for diff-tree header supression Junio C Hamano
2005-05-22  2:42   ` [PATCH] The diff-raw format updates Junio C Hamano
2005-05-22  6:01     ` Linus Torvalds
2005-05-22  6:33       ` Junio C Hamano
2005-05-22  6:57       ` Junio C Hamano
2005-05-22  8:31         ` [PATCH] Fix tweak in similarity estimator Junio C Hamano
2005-05-22 18:35     ` [PATCH] The diff-raw format updates Linus Torvalds
2005-05-22 18:36       ` Niklas Hoglund
2005-05-22 19:15         ` Junio C Hamano
2005-05-22 18:42       ` Thomas Glanzmann
2005-05-22 19:05         ` Linus Torvalds
2005-05-22 19:05           ` Thomas Glanzmann
2005-05-22 19:20           ` Junio C Hamano
2005-05-22 19:35             ` Junio C Hamano
2005-05-22 20:24               ` Linus Torvalds
2005-05-22 23:01                 ` Junio C Hamano
2005-05-22 23:14                   ` Linus Torvalds
2005-05-23  0:35                     ` Junio C Hamano
2005-05-23  1:07                       ` Linus Torvalds
2005-05-23  1:33                         ` Junio C Hamano
2005-05-23  4:26               ` [PATCH] Rename/copy detection fix Junio C Hamano
2005-05-23  4:38                 ` Comments on "Rename/copy detection fix." Junio C Hamano
2005-05-22 19:13       ` [PATCH] The diff-raw format updates Junio C Hamano
2005-05-22  9:41   ` [PATCH] Diffcore updates Junio C Hamano
2005-05-22 16:40     ` Linus Torvalds
2005-05-22 16:47       ` Junio C Hamano
2005-05-22 17:04     ` Junio C Hamano
2005-05-23  4:24       ` [PATCH] Be careful with symlinks when detecting renames and copies Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vr7g0dvbj.fsf@assigned-by-dhcp.cox.net \
    --to=junkio@cox.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.