git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Scott Chacon <schacon@gmail.com>
Cc: Jamey Sharp <jamey@minilop.net>,
	Josh Triplett <josh@freedesktop.org>,
	git@vger.kernel.org
Subject: Re: [RFC] Plumbing-only support for storing object metadata
Date: Sat, 9 Aug 2008 20:51:01 -0700	[thread overview]
Message-ID: <20080810035101.GA22664@spearce.org> (raw)
In-Reply-To: <d411cc4a0808091449n7e0c9b7et7980cf668106aead@mail.gmail.com>

Scott Chacon <schacon@gmail.com> wrote:
> > We began trying to implement this proposal, but we found this enum
> > definition in cache.h, which made us think there's only room for one
> > more kind of object:
> >
> >        enum object_type {
> >                OBJ_BAD = -1,
> >                OBJ_NONE = 0,
> >                OBJ_COMMIT = 1,
> >                OBJ_TREE = 2,
> >                OBJ_BLOB = 3,
> >                OBJ_TAG = 4,
> >                /* 5 for future expansion */
> >                OBJ_OFS_DELTA = 6,
> >                OBJ_REF_DELTA = 7,
> >                OBJ_ANY,
> >                OBJ_MAX,
> >        };
> >
> > Do these object_type values appear in any on-disk structure, or does any
> > other reason exist why this set of values cannot change? Can we add
> > additional object types for inodes and props? If not, what would you
> > recommend instead?
> 
> If I'm not mistaken, these are the values used to identify data in the
> header sections of packfile objects.  The first four bits are used to
> identify the object type, where the first bit is static and the next
> three are the object type of the data following the header.  Since the
> type is encoded using those three bits, 0-7 is the valid range.  I
> would assume that would be difficult to change, since all the
> packfiles depend on that range.

Correct.  There is only room in the pack file for 3 bits in the
type field, resulting in types 0-7 as being the only valid range.

Only type 0 and 5 are available for use.

Nico and I have (at least in the past) agreed that type 0 is meant
as an escape indicator.  If the type is set to 0 then the real type
code appears in another byte of data which follows the object's
inflated length.

That leaves only type 5 available.  Note that because type 5 can be
encoded into a really small space (3 bits) compared to any other
type we may add we really want to use it for something which will
appear _very_frequently_.  The OBJ_DICT_TREE encoding we were talking
about doing for pack v4 fits that bill, as nearly any project (even
huge ones like Mozilla or KDE) would probably be using OBJ_DICT_TREE
thoughout their pack files, and there is a noticable reduction in
disk usage (and increased performance due to lower page faults)
as a result.

The proposed "inode" and "props" types sound like they are useful
for only less common cases, and would appear very infrequently
compared to a tree object.

So yea, there really aren't any new type bits available.

But tossing aside the type bit argument, I'm not sure I see the
value in adding limited arbitrary properties to names in a tree.
How does one edit these?  How do you inspect them before you get
a checkout, assuming they might actually have an impact on the
checkout process?  How the hell do you merge them?

I'm also very concerned about the limited range of values for both
keys and values in a "props" type.  Even if we did go down this
road of supporting such a concept at the plumbing layer (and in the
storage modal) everwhere else we are 8-bit clean.  Commit messages,
tag messages, blob contents, even file names in tree objects.
(OK, file names cannot contain a NUL byte, but whatever, that is
their only limitation.)

The proper encoding for both keys and values should permit any data
to be stored.  Doesn't the extended attributes feature in Linux and
FreeBSD both support any data to be attached to an inode in the fs?

Please don't get me wrong.

I think this is a _BAD_ idea.

A bad idea that will only clutter up the core object model, and
the core processing code of that object model.  Extended attributes
aren't used that much on local filesystems, because they are hard
to work with and suck performance wise.  Performance in Git is
a _feature_.  It matters.  Our clean object model really helps to
make that possible.

-- 
Shawn.

  reply	other threads:[~2008-08-10  3:52 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-09 21:07 [RFC] Plumbing-only support for storing object metadata Jamey Sharp, Josh Triplett
2008-08-09 21:49 ` Scott Chacon
2008-08-10  3:51   ` Shawn O. Pearce [this message]
2008-08-10 11:20     ` Stephen R. van den Berg
2008-08-10 12:16       ` david
2008-08-10 14:50         ` Jan Hudec
2008-08-10 17:57           ` Stephen R. van den Berg
2008-08-10 18:11             ` Jan Hudec
2008-08-10 20:16               ` Stephen R. van den Berg
2008-08-10 22:34                 ` Junio C Hamano
2008-08-10 23:10                   ` david
2008-08-11 10:11                     ` Stephen R. van den Berg
2008-08-16  6:21                 ` Josh Triplett, Jamey Sharp
2008-08-16  7:56                   ` david
2008-08-16  9:55                   ` Junio C Hamano
2008-08-16 15:07                     ` Jan Hudec
2008-08-18  6:12                   ` Shawn O. Pearce
2008-08-18 23:06                     ` Derek Fawcus
2008-08-18 23:18                       ` Shawn O. Pearce
2008-08-18 23:23                       ` Marcus Griep
2008-08-18 23:28                         ` Shawn O. Pearce
2008-08-10 11:09 ` Jan Hudec

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080810035101.GA22664@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=jamey@minilop.net \
    --cc=josh@freedesktop.org \
    --cc=schacon@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).