git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: John Szakmeister <john@szakmeister.net>
Cc: git@vger.kernel.org
Subject: Re: Zero padded file modes...
Date: Thu, 5 Sep 2013 11:36:46 -0400	[thread overview]
Message-ID: <20130905153646.GA12372@sigill.intra.peff.net> (raw)
In-Reply-To: <CAEBDL5W3DL0v=TusuB7Vg-4bWdAJh5d2Psc1N0Qe+KK3bZH3=Q@mail.gmail.com>

On Thu, Sep 05, 2013 at 10:00:39AM -0400, John Szakmeister wrote:

> I went to clone a repository from GitHub today and discovered
> something interesting:
> 
>     :: git clone https://github.com/liebke/incanter.git
>     Cloning into 'incanter'...
>     remote: Counting objects: 10457, done.
>     remote: Compressing objects: 100% (3018/3018), done.
>     error: object 4946e1ba09ba5655202a7a5d81ae106b08411061:contains
> zero-padded file modes
>     fatal: Error in object
>     fatal: index-pack failed

Yep. These were mostly caused by a bug in Grit that is long-fixed.  But
the objects remain in many histories. It would have painful to rewrite
them back then, and it would be even more painful now.

> > This is going to screw up pack v4 (yes, someday I'll have the
> > time to make it real).
> 
> I don't know if this is still true, but given that patches are
> being sent out about it, I thought it relevant.

I haven't looked carefully at the pack v4 patches yet, but I suspect
that yes, it's still a problem. The premise of pack v4 is that we can do
better by not storing the raw git object bytes, but rather storing
specialized representations of the various components. For example, by
using an integer to store the mode rather than the ascii representation.
But that representation does not represent the "oops, I have a 0-padded
mode" quirk. And we have to be able to recover the original object, byte
for byte, from the v4 representation (to verify sha1, or to generate a
loose object or v2 pack).

There are basically two solutions:

  1. Add a single-bit flag for "I am 0-padded in the real data". We
     could probably even squeeze it into the same integer.

  2. Have a "classic" section of the pack that stores the raw object
     bytes. For objects which do not match our expectations, store them
     raw instead of in v4 format. They will not get the benefit of v4
     optimizations, but if they are the minority of objects, that will
     only end up with a slight slow-down.

As I said, I have not looked carefully at the v4 patches, so maybe they
handle this case already. But of the two solutions, I prefer (2). Doing
(1) can solve _this_ problem, but it complicates the format, and does
nothing for any future compatibility issues. Whereas (2) is easy to
implement, since it is basically just pack v2 (and implementations would
need a pack v2 reader anyway).

-Peff

  reply	other threads:[~2013-09-05 15:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-05 14:00 Zero padded file modes John Szakmeister
2013-09-05 15:36 ` Jeff King [this message]
2013-09-05 16:18   ` Duy Nguyen
2013-09-05 16:33     ` Jeff King
2013-09-05 16:56       ` Nicolas Pitre
2013-09-05 16:25   ` A Large Angry SCM
2013-09-05 17:09   ` Nicolas Pitre
2013-09-05 19:10     ` Jeff King
2013-09-05 17:13   ` John Szakmeister
2013-09-05 19:35     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130905153646.GA12372@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=john@szakmeister.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).