From: "jamesmikedupont@googlemail.com" <jamesmikedupont@googlemail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>, git@vger.kernel.org
Subject: Re: Introduction and Wikipedia and Git Blame
Date: Fri, 16 Oct 2009 20:00:17 +0200 [thread overview]
Message-ID: <ee9cc730910161100r71818303v343f555151db4dcc@mail.gmail.com> (raw)
In-Reply-To: <7vbpk7w9qx.fsf@alter.siamese.dyndns.org>
On Fri, Oct 16, 2009 at 7:04 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>> Then I would make modified "texts" from the blob of the file in the
>> current revision and its parent revision, by inserting newlines after
>> every single byte (probably replacing the original newlines by other
>> values, such as \x01).
>>
>> The reason for this touchup is that the diff machinery in Git only handles
>> line-based diffs.
>>
>> Then you can parse the hunk headers, adjust the offsets accordingly,...
>
> I would agree that text converted to "byte-per-line" format would be the
> easiest way to re-use the diff engine, but if you go one more step, you
> can even reusel the blame engine as well. You convert the text into
> "byte-in-hex-and-lf" (e.g. "AB C\n" becomes "41\n42\n20\n43\n0a\n") and
> feed it into existing blame and have it produce script-readable output,
> instead of feeding that to your reinvention of blame using diff engine.
>
> You would need to postprocess the computed result (either by diff or
> blame) to lay out the final text output in either case anyway, and making
> the existing blame engine do the work for you would be a better approach,
> I think.
Please can you tell me what is the basic algorithm of the blame engine?
I will have to start reading code
How can it tell the author a given line and I like the idea of one
line per char, even the newlines would be encoded that way. If it is a
unicode char, it might be multibyte.
The script would get the blame per byte and then recode that into
something visible.
od the octal dump utility comes to mind,
od x1 -w1 will output the file in one byte widths.
Now what about the ability to just pipe the file via some tool and
then run blame on that. It would just start the line with the byte
offset and blame would emit the blame for that offset and emit the
text that is following it.
so for example :
od x1 -w1 somefile :
///////////////////////////////
Offset value
======= ======
0052752 065347
0052754 030356
0052756 035741
0052760 136302
0052762 035346
Here we see the lines are 0052760 - 0052762 =2 apart.
and then if you want wider diffs :
od some file
////////////////////////////////////////////
Offset values
======= ====== ====== ====== ====== ====== ====== ====== ======
0074520 051754 162613 057705 155520 047032 043654 175550 062704
0074540 164400 060340 123434 030350 040457 136010 042270 170525
0074560 165053 124677 125776 031370 000006 102076 060060 052434
0074600 176452 140240 074007 130113 100424 020010 130773 103467
0074620 052776 052421 021544 101357 120035 107562 072641 053636
Here we see the lines are 0074520 - 0074540 = 20 apart.
That way the blame tool will not be concerned with the formatting or
content, the users can write filters like they want, and blame would
only expect a byte offset...
That way, we could write something like this :
grep -b x Test.xml
0:<?xml version="1.0" encoding="UTF-8"?>
39:<gpx
107: xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
then we would get blames for those byte offsets, very simple.
We could reduce this down to : make blame take a list of byte positions.
grep -b \n Test.gpx would be the standard behavior, emit the blame per newline.
mike
next prev parent reply other threads:[~2009-10-16 18:00 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-16 9:07 Introduction and Wikipedia and Git Blame jamesmikedupont
2009-10-16 11:26 ` Johannes Schindelin
2009-10-16 11:38 ` Martin Langhoff
2009-10-16 11:43 ` jamesmikedupont
2009-10-16 14:11 ` Johannes Schindelin
2009-10-16 14:23 ` jamesmikedupont
2009-10-16 17:04 ` Junio C Hamano
2009-10-16 18:00 ` jamesmikedupont [this message]
2009-10-16 19:00 ` Junio C Hamano
2009-10-16 20:05 ` Junio C Hamano
2009-10-16 21:19 ` jamesmikedupont
2009-10-16 23:25 ` Junio C Hamano
2009-10-17 6:50 ` jamesmikedupont
2009-10-17 16:42 ` jamesmikedupont
2009-10-22 6:41 ` jamesmikedupont
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ee9cc730910161100r71818303v343f555151db4dcc@mail.gmail.com \
--to=jamesmikedupont@googlemail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).