git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Plumbing-only support for storing object metadata
@ 2008-08-09 21:07 Jamey Sharp, Josh Triplett
  2008-08-09 21:49 ` Scott Chacon
  2008-08-10 11:09 ` Jan Hudec
  0 siblings, 2 replies; 22+ messages in thread
From: Jamey Sharp, Josh Triplett @ 2008-08-09 21:07 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 4624 bytes --]

The attached test illustrates a proposal for minimal plumbing support
usable to store permissions, ownership, and other metadata in git
repositories. This proposal is fully compatible with existing
repositories when the new functionality is not in use. Similar to the
introduction of subprojects, we have not yet specified the porcelain. We
believe that the plumbing will provide sufficient functionality for many
uses, and these uses will help determine the appropriate porcelain.

We would have included an implementation along with the test, but we
need help with a detail of git internals. More on that at the end. We'd
also appreciate feedback on the proposal.

We propose representing objects with metadata using a new "inode"
object. An inode object contains the hash of the real object and the
hash of a "props" (properties) object. A props object contains a set of
name-value pairs. Tree objects can reference inode objects in addition
to the current possibilities of blobs, trees, and subproject commits; we
propose using the currently invalid type 110000 (S_IFREG | S_IFIFO) for
inode objects. We primarily see a use case for inodes referencing blobs
and trees, though as defined they support any object type.

By separating property objects from inodes, objects with the same
properties can share the same property object; we expect, for instance,
that repositories reflecting /etc will have many references to the
"root:root 644" and "root:root 755" properties.

Both object types have a unique representation: equivalent inodes and
props objects will have the same hash. The exact format of an inode
looks like:
	<object_type> SP <object_sha1> LF
	props SP <props_sha1> LF
A property object looks like a sorted list of one or more of:
	<key> SP <value> LF
The same key is allowed to appear more than once, in which case the
lines will be sorted by the bytes of the values. Allowing duplicate keys
will make it easier to retrieve a set of similar properties such as
acls.

This format implies certain constraints on property names and values. We
propose limiting both names and values to printable ASCII (\x20-\x7E),
and disallowing spaces in keys. If some use case requires property names
or values with binary data, that property could use a printable encoding
such as base64.

We believe this proposal provides a sensible approach to storing
metadata in Git repositories; however, we're happy with any reasonable
solution that provides equivalent functionality. Some alternatives we
considered:

  - We could allow UTF-8 property names or values, rather than strictly
    ASCII. Our proposal is conservative in this regard, allowing an
    extension to UTF-8 later while remaining compatible with existing
    repositories.

  - We could allow arbitrary property names or values, by changing the
    props format to store lengths rather than using delimiters. This
    would not be a compatible change, so it needs to be decided early.

  - Tree objects already store mode bits, but we believe that it would
    prove simpler to store complete modes in properties rather than
    adjusting Git internals to preserve arbitrary mode bits in trees.
    Even if new versions of Git preserved the full mode, existing
    versions of Git might silently give incorrect results. Furthermore,
    mode bits other than executability seem of limited value without
    ownership information.

  - inode objects could directly store properties, rather than
    referencing a separate props object. This would eliminate one
    indirection needed to access properties. However, it would also
    reduce sharing of data for objects with the same properties.
    Furthermore, we expect that the indirection will have negligible
    cost when accessing objects from packs, given appropriately sorted
    packs. Shared props objects also suggest caching at various layers.

  - We could have called them "meta" objects instead of "props", but
    then we couldn't make "mad props" jokes.

We began trying to implement this proposal, but we found this enum
definition in cache.h, which made us think there's only room for one
more kind of object:

	enum object_type {
		OBJ_BAD = -1,
		OBJ_NONE = 0,
		OBJ_COMMIT = 1,
		OBJ_TREE = 2,
		OBJ_BLOB = 3,
		OBJ_TAG = 4,
		/* 5 for future expansion */
		OBJ_OFS_DELTA = 6,
		OBJ_REF_DELTA = 7,
		OBJ_ANY,
		OBJ_MAX,
	};

Do these object_type values appear in any on-disk structure, or does any
other reason exist why this set of values cannot change? Can we add
additional object types for inodes and props? If not, what would you
recommend instead?

- Jamey Sharp and Josh Triplett

[-- Attachment #2: t1008-inodes.sh --]
[-- Type: text/plain, Size: 2838 bytes --]

#!/bin/sh
#
# Copyright (c) 2008 Josh Triplett and Jamey Sharp
#

test_description="Test inode plumbing"

. ./test-lib.sh

cat > shadow <<EOF
root:*:13943:0:99999:7:::
EOF
shadow_sha1=`git hash-object -t blob -w shadow`

cat > props <<EOF
group shadow
mode 640
owner root
EOF
props_sha1=FIXME

cat > inode <<EOF
blob $shadow_sha1
props $props_sha1
EOF
inode_sha1=FIXME

cat > tree <<EOF
110644 inode $inode_sha1	shadow
EOF
tree_sha1=FIXME

test_expect_success 'hash a props' '
	test $props_sha1 = "`git hash-object -t props -w props`"
'

test_expect_success 'cat-file a props' '
	git cat-file props $props_sha1 | cmp -s - props
'

test_expect_success 'hash an inode' '
	test $inode_sha1 = "`git hash-object -t inode -w inode`"
'

test_expect_success 'cat-file an inode' '
	git cat-file inode $inode_sha1 | cmp -s - inode
'

test_expect_success 'tree with inode' '
	test $tree_sha1 = "`git mktree < tree`"
'

test_expect_success 'ls-tree of tree with inode' '
	git ls-tree $tree_sha1 | cmp -s - tree
'

test_expect_success 'check type with cat-file' '
	test inode = "`git cat-file -t $tree_sha1:shadow`"
'

test_expect_success 'cat-file inode tree:inode' '
	git cat-file inode $tree_sha1:shadow | cmp -s - inode
'

test_expect_success 'cat-file blob tree:inode' '
	git cat-file blob $tree_sha1:shadow | cmp -s - shadow
'

test_expect_success 'cat-file props tree:inode' '
	git cat-file props $tree_sha1:shadow | cmp -s - props
'

test_expect_success 'read-tree' '
	git read-tree $tree_sha1
'

test_expect_success 'ls-files shows no modified files' '
	test -z "`git ls-files -m || echo fail`"
'

test_expect_success 'write-tree' '
	test $tree_sha1 = "`git write-tree`"
'

test_expect_success 'commit-tree' '
	COMMIT=`echo Commit with an inode | git commit-tree $tree_sha1` &&
	git update-ref HEAD $COMMIT
'

cat >shadow <<EOF
root:*:13943:0:99999:7:::
jamey:*:13943:0:99999:7:::
josh:*:13943:0:99999:7:::
EOF
shadow_sha1=FIXME

test_expect_success 'ls-files shows modified file' '
	test "shadow" = "`git ls-files -m`"
'

test_expect_success 'add modified file to index' '
	git add shadow
'

test_expect_success 'commit modification' '
	git commit -m "Modify shadow"
'

test_expect_success 'ls-files shows no modified files' '
	test -z "`git ls-files -m || echo fail`"
'

test_expect_success 'check type with cat-file, after modification' '
	test inode = "`git cat-file -t HEAD:shadow`"
'

cat > inode <<EOF
blob $shadow_sha1
props $props_sha1
EOF
inode_sha1=FIXME

test_expect_success 'cat-file inode HEAD:inode, after modification' '
	git cat-file inode HEAD:shadow | cmp -s - inode
'

test_expect_success 'cat-file blob HEAD:inode, after modification' '
	git cat-file blob HEAD:shadow | cmp -s - shadow
'

test_expect_success 'cat-file props HEAD:inode, after modification' '
	git cat-file props HEAD:shadow | cmp -s - props
'

test_done

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2008-08-18 23:29 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-09 21:07 [RFC] Plumbing-only support for storing object metadata Jamey Sharp, Josh Triplett
2008-08-09 21:49 ` Scott Chacon
2008-08-10  3:51   ` Shawn O. Pearce
2008-08-10 11:20     ` Stephen R. van den Berg
2008-08-10 12:16       ` david
2008-08-10 14:50         ` Jan Hudec
2008-08-10 17:57           ` Stephen R. van den Berg
2008-08-10 18:11             ` Jan Hudec
2008-08-10 20:16               ` Stephen R. van den Berg
2008-08-10 22:34                 ` Junio C Hamano
2008-08-10 23:10                   ` david
2008-08-11 10:11                     ` Stephen R. van den Berg
2008-08-16  6:21                 ` Josh Triplett, Jamey Sharp
2008-08-16  7:56                   ` david
2008-08-16  9:55                   ` Junio C Hamano
2008-08-16 15:07                     ` Jan Hudec
2008-08-18  6:12                   ` Shawn O. Pearce
2008-08-18 23:06                     ` Derek Fawcus
2008-08-18 23:18                       ` Shawn O. Pearce
2008-08-18 23:23                       ` Marcus Griep
2008-08-18 23:28                         ` Shawn O. Pearce
2008-08-10 11:09 ` Jan Hudec

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).