From: "Shawn O. Pearce" <spearce@spearce.org>
To: Nasser Grainawi <nasser@codeaurora.org>
Cc: Robin Rosenberg <robin.rosenberg@dewire.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: [JGIT] patch-id
Date: Thu, 8 Oct 2009 09:28:05 -0700 [thread overview]
Message-ID: <20091008162805.GE9261@spearce.org> (raw)
In-Reply-To: <4AC136CC.8040300@codeaurora.org>
Nasser Grainawi <nasser@codeaurora.org> wrote:
> I'm trying to add a public getPatchId method to the jgit Patch class [...]
>
> It seems Patch does some statistical number gathering, but at no point does
> it store a 'slimmed-down' version of a patch.
It parses the patch to create FileHeader objects, one for each
file mentioned in the script. Within each FileHeader there is a
HunkHeader object, one for each hunk present in the patch. Within
each HunkHeader there is an EditList composed of Edit instances;
each Edit instance denotes a contiguous line range within that hunk.
Edit instances come in one of 3 forms:
INSERT: a run of + lines with no - lines
DELETE: a run of - lines with no + lines
REPLACE: a mixture of - and + lines
and their type is actually determined by the line numbers attached
to them. A INSERT has the same starting and ending line number on
the A side, but on the B side the ending line number is at least
one higher than the starting number. DELETE is the reverse, and
REPLACE has both ending numbers higher than the starting number.
IIRC Edit uses 0 based offsets, so line 3 is actually position 2.
These HunkHeader and Edit instances are only available on a text
patch, binary patches use a different representation for the
binary delta. Combined diff patches (--cc format) also lack these
HunkHeader/Edit instances as we don't have a generic n-way patch
parser yet.
> I had the idea to just iterate
> over the FileHeader's and get the byte buffer of each, but I don't think
> those buffers have the parsed data.
The HunkHeader and Edit instances really don't have the actual
line data available to them, they only have the line numbers.
To generate a patch ID you'd need to get the line data too.
Worse, IIRC the patch ID generation in C git favors a 3 line context.
In theory you could modify FileHeader or HunkHeader to produce
a RawText that uses the underlying byte[] returned by getBuffer()
as the backing store, but create a specialized IntList which has the
actual file line numbers mapped to the positions in the patch script.
To do that you'd need to re-walk the patch, like the toEditList()
method in HunkHeader does.
Given that RawText you could feed it through something like
DiffFormatter to create a patch with 3 lines of context, and hash
the relevant bits.
But... that seems like a lot of work.
Also, there is a class in Gerrit Code Review called EditList (not
to be confused with JGit's EditList class!) that really should be
moved back over to JGit. It has some useful routines for walking
through a patch as a series of iterations.
> Short of that, suggestions for how to go about acquiring/storing a parsed
> representation of the data with maximal existing code re-use would be
> appreciated.
I'm coming up short on suggestions right now. I'm not seeing an
easy path to this without writing a bit of code. I think you really
just need to walk the patch... :-\
--
Shawn.
prev parent reply other threads:[~2009-10-08 16:32 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-28 22:21 [JGIT] patch-id Nasser Grainawi
2009-10-08 16:28 ` Shawn O. Pearce [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091008162805.GE9261@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=nasser@codeaurora.org \
--cc=robin.rosenberg@dewire.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).