git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chico Sokol <chico.sokol@gmail.com>
To: Shawn Pearce <spearce@spearce.org>
Cc: John Szakmeister <john@szakmeister.net>, git <git@vger.kernel.org>
Subject: Re: Reading commit objects
Date: Wed, 22 May 2013 11:20:44 -0300	[thread overview]
Message-ID: <CABx5MBS9YgNmZD_tumMJ-MJVjHbRFCKbCjs9AZ347-OCwqO7qQ@mail.gmail.com> (raw)
In-Reply-To: <CAJo=hJtqACW+CR5FkmDfwyK1Wg3Kcppy6DbW7P=On_qJyvsYvQ@mail.gmail.com>

I'm not criticizing JGit, guys. It simply doesn't fit into our needs.
We're not interested in mapping git commands in java and don't have
the same RAM limitations.

I know JGit team is doing a great job and we do not intend to build a
library with such completeness.

Are you guys contributors of JGit? Can you guys point me out to the
code that unpacks git objects? The closest I could get was that class:
https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java

It seems to be a standard and a non standard format of the packed
object, as I read the comments of this method:
https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/file/UnpackedObject.java#L272

I suspect that the default inflater class of java api expect the
object to be in the standard format.

What the following comment mean? What's the "Experimental pack-based"
format? Is there any docs on the specs of that?

We must determine if the buffer contains the standard
zlib-deflated stream or the experimental format based
on the in-pack object format. Compare the header byte
for each format:
RFC1950 zlib w/ deflate : 0www1000 : 0 <= www <= 7
Experimental pack-based : Stttssss : ttt = 1,2,3,4


--
Chico Sokol


On Wed, May 22, 2013 at 2:59 AM, Shawn Pearce <spearce@spearce.org> wrote:
> On Tue, May 21, 2013 at 3:18 PM, Chico Sokol <chico.sokol@gmail.com> wrote:
>> Ok, we discovered that the commit object actually contains the tree
>> object's sha1, by reading its contents with python zlib library.
>>
>> So the bug must be with our java code (we're building a java lib).
>>
>> Is there any non-standard issue in git's zlib compression? We're
>> decompressing its contents with java default zlib api, so it should
>> work normally, here's our code, that's printing that wrong output:
>>
>> import java.io.File;
>> import java.io.FileInputStream;
>> import java.util.zip.InflaterInputStream;
>> import org.apache.commons.io.IOUtils;
>> ...
>> File obj = new File(".git/objects/25/0f67ef017fcb97b5371a302526872cfcadad21");
>> InflaterInputStream inflaterInputStream = new InflaterInputStream(new
>> FileInputStream(obj));
>> System.out.println(IOUtils.readLines(inflaterInputStream));
> ...
>>>> Currently, we're trying to parse commit objects. After decompressing
>>>> the contents of a commit object file we got the following output:
>>>>
>>>> commit 191
>>>> author Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300
>>>> committer Francisco Sokol <chico.sokol@gmail.com> 1369140112 -0300
>>>>
>>>> first commit
>
> Your code is broken. IOUtils is probably corrupting what you get back.
> After inflating the stream you should see the object type ("commit"),
> space, its length in bytes as a base 10 string, and then a NUL ('\0').
> Following that is the tree line, and parent(s) if any. I wonder if
> IOUtils discarded the remainder of the line after the NUL and did not
> consider the tree line.
>
> And you wonder why JGit code is confusing. We can't rely on "standard
> Java APIs" to do the right thing, because commonly used libraries have
> made assumptions that disagree with the way Git works.

  reply	other threads:[~2013-05-22 14:21 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-21 21:21 Reading commit objects Chico Sokol
2013-05-21 21:25 ` Felipe Contreras
2013-05-21 21:37 ` John Szakmeister
2013-05-21 22:18   ` Chico Sokol
2013-05-21 22:22     ` Junio C Hamano
2013-05-21 22:33       ` Chico Sokol
2013-05-21 23:34         ` Jonathan Nieder
2013-05-22  5:54         ` Shawn Pearce
2013-05-22  4:51     ` java zlib woes (was: Reading commit objects) Andreas Krey
2013-05-22  5:56       ` Shawn Pearce
2013-05-27  4:11         ` Andreas Krey
2013-06-04 10:18           ` fetch delta resolution vs. checkout (was: java zlib woes) Andreas Krey
2013-05-22  5:59     ` Reading commit objects Shawn Pearce
2013-05-22 14:20       ` Chico Sokol [this message]
2013-05-22 20:02         ` Shawn Pearce
2013-05-22 14:25       ` Chico Sokol
2013-05-22 14:47         ` Chico Sokol
2013-05-22 19:59         ` Shawn Pearce
2013-05-21 22:20 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABx5MBS9YgNmZD_tumMJ-MJVjHbRFCKbCjs9AZ347-OCwqO7qQ@mail.gmail.com \
    --to=chico.sokol@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=john@szakmeister.net \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).