From mboxrd@z Thu Jan 1 00:00:00 1970 From: "jamesmikedupont@googlemail.com" Subject: Re: Introduction and Wikipedia and Git Blame Date: Fri, 16 Oct 2009 20:00:17 +0200 Message-ID: References: <7vbpk7w9qx.fsf@alter.siamese.dyndns.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Johannes Schindelin , git@vger.kernel.org To: Junio C Hamano X-From: git-owner@vger.kernel.org Fri Oct 16 20:00:33 2009 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.176.167]) by lo.gmane.org with esmtp (Exim 4.50) id 1Myr6K-0003kJ-Lj for gcvg-git-2@lo.gmane.org; Fri, 16 Oct 2009 20:00:29 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751640AbZJPSAQ convert rfc822-to-quoted-printable (ORCPT ); Fri, 16 Oct 2009 14:00:16 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751167AbZJPSAP (ORCPT ); Fri, 16 Oct 2009 14:00:15 -0400 Received: from mail-fx0-f218.google.com ([209.85.220.218]:64976 "EHLO mail-fx0-f218.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751088AbZJPSAO convert rfc822-to-8bit (ORCPT ); Fri, 16 Oct 2009 14:00:14 -0400 Received: by fxm18 with SMTP id 18so2707448fxm.37 for ; Fri, 16 Oct 2009 11:00:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=uVQ+eQYkW78+qbbBgb+eFPoRCRUNBfiK/WRRmpu8etw=; b=Xd1fM00ZwMUns6NG8XTY00kMHsRLsxINT9OMc+N6HLDFOruKfJQIcMFIfR3fs5rxvc oDFWqm5TDBGxc0ZWZiBA7QjlPDV44ADqGgM/jOXlEQfjYdLUszOGvG626gljkS9W+x6a EZ7C3MGtaxk1yDzLOTcOH5KfnfXs3PBTzrcBo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=F4FKRqbC9cyMYyv46N8GKg7FdoPoXocupVSx8eydfCieryQ8ylO2qwpkB4zkOWAlkZ 4YssnP+mDo2+TIkU5oYJo2Wd8b35FVZPOydE81geDQmvjoOWTdpgdxLnC3YepkD7pD3D QYyT2NjVmP2VLQATqR+Am20hPZy/ns/r5gWNk= Received: by 10.204.174.209 with SMTP id u17mr1625698bkz.7.1255716017629; Fri, 16 Oct 2009 11:00:17 -0700 (PDT) In-Reply-To: <7vbpk7w9qx.fsf@alter.siamese.dyndns.org> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Fri, Oct 16, 2009 at 7:04 PM, Junio C Hamano wro= te: > Johannes Schindelin writes: > >> Then I would make modified "texts" from the blob of the file in the >> current revision and its parent revision, by inserting newlines afte= r >> every single byte (probably replacing the original newlines by other >> values, such as \x01). >> >> The reason for this touchup is that the diff machinery in Git only h= andles >> line-based diffs. >> >> Then you can parse the hunk headers, adjust the offsets accordingly,= =2E.. > > I would agree that text converted to "byte-per-line" format would be = the > easiest way to re-use the diff engine, but if you go one more step, y= ou > can even reusel the blame engine as well. =A0You convert the text int= o > "byte-in-hex-and-lf" (e.g. "AB C\n" becomes "41\n42\n20\n43\n0a\n") a= nd > feed it into existing blame and have it produce script-readable outpu= t, > instead of feeding that to your reinvention of blame using diff engin= e. > > You would need to postprocess the computed result (either by diff or > blame) to lay out the final text output in either case anyway, and ma= king > the existing blame engine do the work for you would be a better appro= ach, > I think. Please can you tell me what is the basic algorithm of the blame engine? I will have to start reading code How can it tell the author a given line and I like the idea of one line per char, even the newlines would be encoded that way. If it is a unicode char, it might be multibyte. The script would get the blame per byte and then recode that into something visible. od the octal dump utility comes to mind, od x1 -w1 will output the file in one byte widths. Now what about the ability to just pipe the file via some tool and then run blame on that. It would just start the line with the byte offset and blame would emit the blame for that offset and emit the text that is following it. so for example : od x1 -w1 somefile : /////////////////////////////// Offset value =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D 0052752 065347 0052754 030356 0052756 035741 0052760 136302 0052762 035346 Here we see the lines are 0052760 - 0052762 =3D2 apart. and then if you want wider diffs : od some file //////////////////////////////////////////// Offset values =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D =3D=3D=3D=3D= =3D=3D =3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D =3D=3D=3D= =3D=3D=3D =3D=3D=3D=3D=3D=3D 0074520 051754 162613 057705 155520 047032 043654 175550 062704 0074540 164400 060340 123434 030350 040457 136010 042270 170525 0074560 165053 124677 125776 031370 000006 102076 060060 052434 0074600 176452 140240 074007 130113 100424 020010 130773 103467 0074620 052776 052421 021544 101357 120035 107562 072641 053636 Here we see the lines are 0074520 - 0074540 =3D 20 apart. That way the blame tool will not be concerned with the formatting or content, the users can write filters like they want, and blame would only expect a byte offset... That way, we could write something like this : grep -b x Test.xml 0: 39: