[RFC] Plumbing-only support for storing object metadata

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] Plumbing-only support for storing object metadata
@ 2008-08-09 21:07 Jamey Sharp, Josh Triplett
  2008-08-09 21:49 ` Scott Chacon
  2008-08-10 11:09 ` Jan Hudec
  0 siblings, 2 replies; 22+ messages in thread
From: Jamey Sharp, Josh Triplett @ 2008-08-09 21:07 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 4624 bytes --]

The attached test illustrates a proposal for minimal plumbing support
usable to store permissions, ownership, and other metadata in git
repositories. This proposal is fully compatible with existing
repositories when the new functionality is not in use. Similar to the
introduction of subprojects, we have not yet specified the porcelain. We
believe that the plumbing will provide sufficient functionality for many
uses, and these uses will help determine the appropriate porcelain.

We would have included an implementation along with the test, but we
need help with a detail of git internals. More on that at the end. We'd
also appreciate feedback on the proposal.

We propose representing objects with metadata using a new "inode"
object. An inode object contains the hash of the real object and the
hash of a "props" (properties) object. A props object contains a set of
name-value pairs. Tree objects can reference inode objects in addition
to the current possibilities of blobs, trees, and subproject commits; we
propose using the currently invalid type 110000 (S_IFREG | S_IFIFO) for
inode objects. We primarily see a use case for inodes referencing blobs
and trees, though as defined they support any object type.

By separating property objects from inodes, objects with the same
properties can share the same property object; we expect, for instance,
that repositories reflecting /etc will have many references to the
"root:root 644" and "root:root 755" properties.

Both object types have a unique representation: equivalent inodes and
props objects will have the same hash. The exact format of an inode
looks like:
	<object_type> SP <object_sha1> LF
	props SP <props_sha1> LF
A property object looks like a sorted list of one or more of:
	<key> SP <value> LF
The same key is allowed to appear more than once, in which case the
lines will be sorted by the bytes of the values. Allowing duplicate keys
will make it easier to retrieve a set of similar properties such as
acls.

This format implies certain constraints on property names and values. We
propose limiting both names and values to printable ASCII (\x20-\x7E),
and disallowing spaces in keys. If some use case requires property names
or values with binary data, that property could use a printable encoding
such as base64.

We believe this proposal provides a sensible approach to storing
metadata in Git repositories; however, we're happy with any reasonable
solution that provides equivalent functionality. Some alternatives we
considered:

  - We could allow UTF-8 property names or values, rather than strictly
    ASCII. Our proposal is conservative in this regard, allowing an
    extension to UTF-8 later while remaining compatible with existing
    repositories.

  - We could allow arbitrary property names or values, by changing the
    props format to store lengths rather than using delimiters. This
    would not be a compatible change, so it needs to be decided early.

  - Tree objects already store mode bits, but we believe that it would
    prove simpler to store complete modes in properties rather than
    adjusting Git internals to preserve arbitrary mode bits in trees.
    Even if new versions of Git preserved the full mode, existing
    versions of Git might silently give incorrect results. Furthermore,
    mode bits other than executability seem of limited value without
    ownership information.

  - inode objects could directly store properties, rather than
    referencing a separate props object. This would eliminate one
    indirection needed to access properties. However, it would also
    reduce sharing of data for objects with the same properties.
    Furthermore, we expect that the indirection will have negligible
    cost when accessing objects from packs, given appropriately sorted
    packs. Shared props objects also suggest caching at various layers.

  - We could have called them "meta" objects instead of "props", but
    then we couldn't make "mad props" jokes.

We began trying to implement this proposal, but we found this enum
definition in cache.h, which made us think there's only room for one
more kind of object:

	enum object_type {
		OBJ_BAD = -1,
		OBJ_NONE = 0,
		OBJ_COMMIT = 1,
		OBJ_TREE = 2,
		OBJ_BLOB = 3,
		OBJ_TAG = 4,
		/* 5 for future expansion */
		OBJ_OFS_DELTA = 6,
		OBJ_REF_DELTA = 7,
		OBJ_ANY,
		OBJ_MAX,
	};

Do these object_type values appear in any on-disk structure, or does any
other reason exist why this set of values cannot change? Can we add
additional object types for inodes and props? If not, what would you
recommend instead?

- Jamey Sharp and Josh Triplett

[-- Attachment #2: t1008-inodes.sh --]
[-- Type: text/plain, Size: 2838 bytes --]

#!/bin/sh
#
# Copyright (c) 2008 Josh Triplett and Jamey Sharp
#

test_description="Test inode plumbing"

. ./test-lib.sh

cat > shadow <<EOF
root:*:13943:0:99999:7:::
EOF
shadow_sha1=`git hash-object -t blob -w shadow`

cat > props <<EOF
group shadow
mode 640
owner root
EOF
props_sha1=FIXME

cat > inode <<EOF
blob $shadow_sha1
props $props_sha1
EOF
inode_sha1=FIXME

cat > tree <<EOF
110644 inode $inode_sha1	shadow
EOF
tree_sha1=FIXME

test_expect_success 'hash a props' '
	test $props_sha1 = "`git hash-object -t props -w props`"
'

test_expect_success 'cat-file a props' '
	git cat-file props $props_sha1 | cmp -s - props
'

test_expect_success 'hash an inode' '
	test $inode_sha1 = "`git hash-object -t inode -w inode`"
'

test_expect_success 'cat-file an inode' '
	git cat-file inode $inode_sha1 | cmp -s - inode
'

test_expect_success 'tree with inode' '
	test $tree_sha1 = "`git mktree < tree`"
'

test_expect_success 'ls-tree of tree with inode' '
	git ls-tree $tree_sha1 | cmp -s - tree
'

test_expect_success 'check type with cat-file' '
	test inode = "`git cat-file -t $tree_sha1:shadow`"
'

test_expect_success 'cat-file inode tree:inode' '
	git cat-file inode $tree_sha1:shadow | cmp -s - inode
'

test_expect_success 'cat-file blob tree:inode' '
	git cat-file blob $tree_sha1:shadow | cmp -s - shadow
'

test_expect_success 'cat-file props tree:inode' '
	git cat-file props $tree_sha1:shadow | cmp -s - props
'

test_expect_success 'read-tree' '
	git read-tree $tree_sha1
'

test_expect_success 'ls-files shows no modified files' '
	test -z "`git ls-files -m || echo fail`"
'

test_expect_success 'write-tree' '
	test $tree_sha1 = "`git write-tree`"
'

test_expect_success 'commit-tree' '
	COMMIT=`echo Commit with an inode | git commit-tree $tree_sha1` &&
	git update-ref HEAD $COMMIT
'

cat >shadow <<EOF
root:*:13943:0:99999:7:::
jamey:*:13943:0:99999:7:::
josh:*:13943:0:99999:7:::
EOF
shadow_sha1=FIXME

test_expect_success 'ls-files shows modified file' '
	test "shadow" = "`git ls-files -m`"
'

test_expect_success 'add modified file to index' '
	git add shadow
'

test_expect_success 'commit modification' '
	git commit -m "Modify shadow"
'

test_expect_success 'ls-files shows no modified files' '
	test -z "`git ls-files -m || echo fail`"
'

test_expect_success 'check type with cat-file, after modification' '
	test inode = "`git cat-file -t HEAD:shadow`"
'

cat > inode <<EOF
blob $shadow_sha1
props $props_sha1
EOF
inode_sha1=FIXME

test_expect_success 'cat-file inode HEAD:inode, after modification' '
	git cat-file inode HEAD:shadow | cmp -s - inode
'

test_expect_success 'cat-file blob HEAD:inode, after modification' '
	git cat-file blob HEAD:shadow | cmp -s - shadow
'

test_expect_success 'cat-file props HEAD:inode, after modification' '
	git cat-file props HEAD:shadow | cmp -s - props
'

test_done

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-09 21:07 [RFC] Plumbing-only support for storing object metadata Jamey Sharp, Josh Triplett
@ 2008-08-09 21:49 ` Scott Chacon
  2008-08-10  3:51   ` Shawn O. Pearce
  2008-08-10 11:09 ` Jan Hudec
  1 sibling, 1 reply; 22+ messages in thread
From: Scott Chacon @ 2008-08-09 21:49 UTC (permalink / raw)
  To: Jamey Sharp, Josh Triplett, git

> We began trying to implement this proposal, but we found this enum
> definition in cache.h, which made us think there's only room for one
> more kind of object:
>
>        enum object_type {
>                OBJ_BAD = -1,
>                OBJ_NONE = 0,
>                OBJ_COMMIT = 1,
>                OBJ_TREE = 2,
>                OBJ_BLOB = 3,
>                OBJ_TAG = 4,
>                /* 5 for future expansion */
>                OBJ_OFS_DELTA = 6,
>                OBJ_REF_DELTA = 7,
>                OBJ_ANY,
>                OBJ_MAX,
>        };
>
> Do these object_type values appear in any on-disk structure, or does any
> other reason exist why this set of values cannot change? Can we add
> additional object types for inodes and props? If not, what would you
> recommend instead?

If I'm not mistaken, these are the values used to identify data in the
header sections of packfile objects.  The first four bits are used to
identify the object type, where the first bit is static and the next
three are the object type of the data following the header.  Since the
type is encoded using those three bits, 0-7 is the valid range.  I
would assume that would be difficult to change, since all the
packfiles depend on that range.

Scott

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-09 21:49 ` Scott Chacon
@ 2008-08-10  3:51   ` Shawn O. Pearce
  2008-08-10 11:20     ` Stephen R. van den Berg
  0 siblings, 1 reply; 22+ messages in thread
From: Shawn O. Pearce @ 2008-08-10  3:51 UTC (permalink / raw)
  To: Scott Chacon; +Cc: Jamey Sharp, Josh Triplett, git

Scott Chacon <schacon@gmail.com> wrote:
> > We began trying to implement this proposal, but we found this enum
> > definition in cache.h, which made us think there's only room for one
> > more kind of object:
> >
> >        enum object_type {
> >                OBJ_BAD = -1,
> >                OBJ_NONE = 0,
> >                OBJ_COMMIT = 1,
> >                OBJ_TREE = 2,
> >                OBJ_BLOB = 3,
> >                OBJ_TAG = 4,
> >                /* 5 for future expansion */
> >                OBJ_OFS_DELTA = 6,
> >                OBJ_REF_DELTA = 7,
> >                OBJ_ANY,
> >                OBJ_MAX,
> >        };
> >
> > Do these object_type values appear in any on-disk structure, or does any
> > other reason exist why this set of values cannot change? Can we add
> > additional object types for inodes and props? If not, what would you
> > recommend instead?
> 
> If I'm not mistaken, these are the values used to identify data in the
> header sections of packfile objects.  The first four bits are used to
> identify the object type, where the first bit is static and the next
> three are the object type of the data following the header.  Since the
> type is encoded using those three bits, 0-7 is the valid range.  I
> would assume that would be difficult to change, since all the
> packfiles depend on that range.

Correct.  There is only room in the pack file for 3 bits in the
type field, resulting in types 0-7 as being the only valid range.

Only type 0 and 5 are available for use.

Nico and I have (at least in the past) agreed that type 0 is meant
as an escape indicator.  If the type is set to 0 then the real type
code appears in another byte of data which follows the object's
inflated length.

That leaves only type 5 available.  Note that because type 5 can be
encoded into a really small space (3 bits) compared to any other
type we may add we really want to use it for something which will
appear _very_frequently_.  The OBJ_DICT_TREE encoding we were talking
about doing for pack v4 fits that bill, as nearly any project (even
huge ones like Mozilla or KDE) would probably be using OBJ_DICT_TREE
thoughout their pack files, and there is a noticable reduction in
disk usage (and increased performance due to lower page faults)
as a result.

The proposed "inode" and "props" types sound like they are useful
for only less common cases, and would appear very infrequently
compared to a tree object.

So yea, there really aren't any new type bits available.

But tossing aside the type bit argument, I'm not sure I see the
value in adding limited arbitrary properties to names in a tree.
How does one edit these?  How do you inspect them before you get
a checkout, assuming they might actually have an impact on the
checkout process?  How the hell do you merge them?

I'm also very concerned about the limited range of values for both
keys and values in a "props" type.  Even if we did go down this
road of supporting such a concept at the plumbing layer (and in the
storage modal) everwhere else we are 8-bit clean.  Commit messages,
tag messages, blob contents, even file names in tree objects.
(OK, file names cannot contain a NUL byte, but whatever, that is
their only limitation.)

The proper encoding for both keys and values should permit any data
to be stored.  Doesn't the extended attributes feature in Linux and
FreeBSD both support any data to be attached to an inode in the fs?

Please don't get me wrong.

I think this is a _BAD_ idea.

A bad idea that will only clutter up the core object model, and
the core processing code of that object model.  Extended attributes
aren't used that much on local filesystems, because they are hard
to work with and suck performance wise.  Performance in Git is
a _feature_.  It matters.  Our clean object model really helps to
make that possible.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-10  3:51   ` Shawn O. Pearce
@ 2008-08-10 11:20     ` Stephen R. van den Berg
  2008-08-10 12:16       ` david
  0 siblings, 1 reply; 22+ messages in thread
From: Stephen R. van den Berg @ 2008-08-10 11:20 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Scott Chacon, Jamey Sharp, Josh Triplett, git

Shawn O. Pearce wrote:
>The proper encoding for both keys and values should permit any data
>to be stored.  Doesn't the extended attributes feature in Linux and
>FreeBSD both support any data to be attached to an inode in the fs?

I'd think so yes, so any attempt to store the metadata should support it
as well.
That also would imply that any such metadata storage would have to allow
for arbitrary blobs to be stored under tag-names.
And *that* would imply that anything that implements a kludge like
specifying a flat-file format to encode name/value pairs doesn't scale.

>I think this is a _BAD_ idea.

>A bad idea that will only clutter up the core object model, and
>the core processing code of that object model.  Extended attributes
>aren't used that much on local filesystems, because they are hard
>to work with and suck performance wise.  Performance in Git is
>a _feature_.  It matters.  Our clean object model really helps to
>make that possible.

Quite right.

However, pondering the idea a bit more, I could envision something
similar to the following:

In the git tree the following layout would be used:

plainfile.txt
otherdir/otherplainfile.txt
projects/README
projects/README/_owner
projects/README/_acl
projects/README/_icon
projects/README/_mimetype
projects/something.mpeg
projects/something.mpeg/_icon
projects/something.mpeg/_mimetype
projects/asubdir/thirdplainfile.txt

That would imply that in the tree storage, the only extension would be
that for any given reference to a blob in a tree object, there could be
a reference to a tree object as well.  I.e. something like this in the
tree object:

100644 blob f7b7414159b8a7159538fac543b2b19ef531968e  README
000000 tree df6ee415f04d6ccea5dab0de562c2f155583a2c4  README
100644 blob 0a54f8ec13df03cf6bdb5b973acec6d8141c01cc  something.mpeg
000000 tree a421448d765abb7bb979dc1d56621d0fc9b41229  soemthing.mpeg

The extra tree reference for README would actually refer to something like:

100644 blob be3365fdaae0f4ed8c22c4cf38a4b1f88f9069c3  _owner
100644 blob 739e9e8f3d095931084b54cbf7f90d8f64eb0ac6  _acl
100644 blob bc1a868bb50644712966a50150d21199c401d6d5  _icon
100644 blob 6076bde5b3b6b8bed4ec4968d09abdbf015b3b75  _mimetype

Which would contain the extra attributes.

And that would imply that during checkout you can do a rich checkout or a
flat checkout for any files under the projects directory.

A flat checkout results in the following files in the filesystem:

plainfile.txt
otherdir/otherplainfile.txt
projects/README
projects/README.attr/_owner
projects/README.attr/_acl
projects/README.attr/_icon
projects/README.attr/_mimetype
projects/something.mpeg
projects/something.mpeg.attr/_icon
projects/something.mpeg.attr/_mimetype
projects/asubdir/thirdplainfile.txt

A rich checkout results in the following files in the filesystem:

plainfile.txt
otherdir/otherplainfile.txt
projects/README
projects/something.mpeg
projects/asubdir/thirdplainfile.txt
projects/asubdir/fourthplainfile.txt

The rich checkout also applies the extended attributes/metadata to the
filesystem (i.e. it would store all the metadata in the appropriate
places).

The nice thing about this setup is that:
a. There is *no* change whatsoever to existing repositories or
   repositoryformat.
b. It's less filling (i.e. there are no special bits or object types to be
   used).
c. Speed for files without attributes is not affected.
d. It's fully 8-bit-transparent.
e. It scales, even if you have large or many attributes.
f. It uses the natural tree storage abstraction already supported in
   git repositories to store the additional data.
g. It allows reuse of attribute information at many levels.
h. It even allows for a hierarchy of attributes attached to a single
   file (no current filesystem supports that (yet)).
i. The only change in the fast-path of core-git is that it would have to
   know how to skip tree objects referenced in a tree object if a
   same-name blob object is already there.  This can even be optimised
   by requiring the attribute-tree to have a very specific (e.g. 0)
   mode to ease detection.
j. Editing and merging the meta-information could be made an almost
   natural operation in the flat-checkout mode (the extension to be used
   to name the attribute subdir should be made configurable).
-- 
Sincerely,
           Stephen R. van den Berg.

Real programmers don't produce results, they return exit codes.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-10 11:20     ` Stephen R. van den Berg
@ 2008-08-10 12:16       ` david
  2008-08-10 14:50         ` Jan Hudec
  0 siblings, 1 reply; 22+ messages in thread
From: david @ 2008-08-10 12:16 UTC (permalink / raw)
  To: Stephen R. van den Berg
  Cc: Shawn O. Pearce, Scott Chacon, Jamey Sharp, Josh Triplett, git

On Sun, 10 Aug 2008, Stephen R. van den Berg wrote:

> Shawn O. Pearce wrote:
>> The proper encoding for both keys and values should permit any data
>> to be stored.  Doesn't the extended attributes feature in Linux and
>> FreeBSD both support any data to be attached to an inode in the fs?
>
> I'd think so yes, so any attempt to store the metadata should support it
> as well.
> That also would imply that any such metadata storage would have to allow
> for arbitrary blobs to be stored under tag-names.
> And *that* would imply that anything that implements a kludge like
> specifying a flat-file format to encode name/value pairs doesn't scale.
>
>> I think this is a _BAD_ idea.
>
>> A bad idea that will only clutter up the core object model, and
>> the core processing code of that object model.  Extended attributes
>> aren't used that much on local filesystems, because they are hard
>> to work with and suck performance wise.  Performance in Git is
>> a _feature_.  It matters.  Our clean object model really helps to
>> make that possible.
>
> Quite right.
>
> However, pondering the idea a bit more, I could envision something
> similar to the following:
>
> In the git tree the following layout would be used:
>
> plainfile.txt
> otherdir/otherplainfile.txt
> projects/README
> projects/README/_owner
> projects/README/_acl
> projects/README/_icon
> projects/README/_mimetype
> projects/something.mpeg
> projects/something.mpeg/_icon
> projects/something.mpeg/_mimetype
> projects/asubdir/thirdplainfile.txt
>
> That would imply that in the tree storage, the only extension would be
> that for any given reference to a blob in a tree object, there could be
> a reference to a tree object as well.  I.e. something like this in the
> tree object:
>
> 100644 blob f7b7414159b8a7159538fac543b2b19ef531968e  README
> 000000 tree df6ee415f04d6ccea5dab0de562c2f155583a2c4  README
> 100644 blob 0a54f8ec13df03cf6bdb5b973acec6d8141c01cc  something.mpeg
> 000000 tree a421448d765abb7bb979dc1d56621d0fc9b41229  soemthing.mpeg
>
> The extra tree reference for README would actually refer to something like:
>
> 100644 blob be3365fdaae0f4ed8c22c4cf38a4b1f88f9069c3  _owner
> 100644 blob 739e9e8f3d095931084b54cbf7f90d8f64eb0ac6  _acl
> 100644 blob bc1a868bb50644712966a50150d21199c401d6d5  _icon
> 100644 blob 6076bde5b3b6b8bed4ec4968d09abdbf015b3b75  _mimetype
>
> Which would contain the extra attributes.
>
> And that would imply that during checkout you can do a rich checkout or a
> flat checkout for any files under the projects directory.
>
> A flat checkout results in the following files in the filesystem:
>
> plainfile.txt
> otherdir/otherplainfile.txt
> projects/README
> projects/README.attr/_owner
> projects/README.attr/_acl
> projects/README.attr/_icon
> projects/README.attr/_mimetype
> projects/something.mpeg
> projects/something.mpeg.attr/_icon
> projects/something.mpeg.attr/_mimetype
> projects/asubdir/thirdplainfile.txt
>
> A rich checkout results in the following files in the filesystem:
>
> plainfile.txt
> otherdir/otherplainfile.txt
> projects/README
> projects/something.mpeg
> projects/asubdir/thirdplainfile.txt
> projects/asubdir/fourthplainfile.txt
>
> The rich checkout also applies the extended attributes/metadata to the
> filesystem (i.e. it would store all the metadata in the appropriate
> places).
>
> The nice thing about this setup is that:
> a. There is *no* change whatsoever to existing repositories or
>   repositoryformat.
> b. It's less filling (i.e. there are no special bits or object types to be
>   used).
> c. Speed for files without attributes is not affected.
> d. It's fully 8-bit-transparent.
> e. It scales, even if you have large or many attributes.
> f. It uses the natural tree storage abstraction already supported in
>   git repositories to store the additional data.
> g. It allows reuse of attribute information at many levels.
> h. It even allows for a hierarchy of attributes attached to a single
>   file (no current filesystem supports that (yet)).
> i. The only change in the fast-path of core-git is that it would have to
>   know how to skip tree objects referenced in a tree object if a
>   same-name blob object is already there.  This can even be optimised
>   by requiring the attribute-tree to have a very specific (e.g. 0)
>   mode to ease detection.
> j. Editing and merging the meta-information could be made an almost
>   natural operation in the flat-checkout mode (the extension to be used
>   to name the attribute subdir should be made configurable).

you also need to be able to add something to the attribute tree to 
indicate what type of metadata is being stored in it. you could  have *nix 
perms, windows perms, posix extended attributes, or other things.

I could see this as a great way to deal with editing exif data for images. 
when checking in a .jpg, extract the .exif data and store it seperately, 
when doing a rich checkout combine it back into the .jpg file. now the 
large binary blob doesn't change so you don't have to try and find deltas 
for it.

all the special case things would be in the helper routines written to do 
the 'rich checkin/checkout' of each type. people who don't care about 
this don't enable these helpers in the configs and so don't suffer any 
overhead (other then item (i) above)

this has the potential to be horribly abused, but it also has the 
potential to open up some very interesting possibilities as well.

David Lang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-10 12:16       ` david
@ 2008-08-10 14:50         ` Jan Hudec
  2008-08-10 17:57           ` Stephen R. van den Berg
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Hudec @ 2008-08-10 14:50 UTC (permalink / raw)
  To: david
  Cc: Stephen R. van den Berg, Shawn O. Pearce, Scott Chacon,
	Jamey Sharp, Josh Triplett, git

On Sun, Aug 10, 2008 at 05:16:47 -0700, david@lang.hm wrote:
> On Sun, 10 Aug 2008, Stephen R. van den Berg wrote:
>> However, pondering the idea a bit more, I could envision something
>> similar to the following:
>>
>> In the git tree the following layout would be used:
>>
>> plainfile.txt
>> otherdir/otherplainfile.txt
>> projects/README
>> projects/README/_owner
>> projects/README/_acl
>> projects/README/_icon
>> projects/README/_mimetype
>> projects/something.mpeg
>> projects/something.mpeg/_icon
>> projects/something.mpeg/_mimetype
>> projects/asubdir/thirdplainfile.txt
>>
>> That would imply that in the tree storage, the only extension would be
>> that for any given reference to a blob in a tree object, there could be
>> a reference to a tree object as well.  I.e. something like this in the
>> tree object:
>>
>> 100644 blob f7b7414159b8a7159538fac543b2b19ef531968e  README
>> 000000 tree df6ee415f04d6ccea5dab0de562c2f155583a2c4  README
>> 100644 blob 0a54f8ec13df03cf6bdb5b973acec6d8141c01cc  something.mpeg
>> 000000 tree a421448d765abb7bb979dc1d56621d0fc9b41229  soemthing.mpeg
>>
>> The extra tree reference for README would actually refer to something like:
>>
>> 100644 blob be3365fdaae0f4ed8c22c4cf38a4b1f88f9069c3  _owner
>> 100644 blob 739e9e8f3d095931084b54cbf7f90d8f64eb0ac6  _acl
>> 100644 blob bc1a868bb50644712966a50150d21199c401d6d5  _icon
>> 100644 blob 6076bde5b3b6b8bed4ec4968d09abdbf015b3b75  _mimetype
>>
>> Which would contain the extra attributes.

... provided the two entries under the same name wouldn't drive the internal
logic completely mad, I quite like this. Note by the way, that you need to
allow for two trees too, because you may want to store attributes for
directories too. It's no problem to differentiate them by type 04755
vs. 00000 or 11000 or whatever, but it is a problem for index, because that
does not store directory entries, so metadata for a directory would conflict
with regular entries in it. Can be fixed by using different filetype for the
metadata.

>> And that would imply that during checkout you can do a rich checkout or a
>> flat checkout for any files under the projects directory.
>>
>> A flat checkout results in the following files in the filesystem:
>>
>> plainfile.txt
>> otherdir/otherplainfile.txt
>> projects/README
>> projects/README.attr/_owner
>> projects/README.attr/_acl
>> projects/README.attr/_icon
>> projects/README.attr/_mimetype
>> projects/something.mpeg
>> projects/something.mpeg.attr/_icon
>> projects/something.mpeg.attr/_mimetype
>> projects/asubdir/thirdplainfile.txt

Storing like this in index as well would make it even more compatible. Of
course you are reserving the .attr suffix. But it's probably OK to reserve
/something/ for this functionality (when the functionality is needed only).
Maybe it could use some special character (@, #, =, $ or something) to
separate the suffix instead of normal . to decrease the chance to conflict
with other use.

>> A rich checkout results in the following files in the filesystem:
>>
>> plainfile.txt
>> otherdir/otherplainfile.txt
>> projects/README
>> projects/something.mpeg
>> projects/asubdir/thirdplainfile.txt
>> projects/asubdir/fourthplainfile.txt
>>
>> The rich checkout also applies the extended attributes/metadata to the
>> filesystem (i.e. it would store all the metadata in the appropriate
>> places).
>>
>> The nice thing about this setup is that:
>> a. There is *no* change whatsoever to existing repositories or
>>   repositoryformat.

Well, there is a small change -- it needs to support multiple entries with
different type but same name in the tree object (but could be avoided by
using some special reserved suffix). Plus the index functionality needs to be
modified to put the metadata entries in the right places. Still of course
much less invasive than the proposal from OP.

>> b. It's less filling (i.e. there are no special bits or object types to be
>>   used).
>> c. Speed for files without attributes is not affected.
>> d. It's fully 8-bit-transparent.
>> e. It scales, even if you have large or many attributes.
>> f. It uses the natural tree storage abstraction already supported in
>>   git repositories to store the additional data.
>> g. It allows reuse of attribute information at many levels.
>> h. It even allows for a hierarchy of attributes attached to a single
>>   file (no current filesystem supports that (yet)).
>> i. The only change in the fast-path of core-git is that it would have to
>>   know how to skip tree objects referenced in a tree object if a
>>   same-name blob object is already there.  This can even be optimised
>>   by requiring the attribute-tree to have a very specific (e.g. 0)
>>   mode to ease detection.
>> j. Editing and merging the meta-information could be made an almost
>>   natural operation in the flat-checkout mode (the extension to be used
>>   to name the attribute subdir should be made configurable).
>
> you also need to be able to add something to the attribute tree to  
> indicate what type of metadata is being stored in it. you could  have 
> *nix perms, windows perms, posix extended attributes, or other things.

Well, not really. I think the best way to implement the 'rich' checkout is to
use a hook to read/write the metadata. Git-core should just support storing
attributes, but not actually store any of it's own, since they are nt needed
for it's main purpose, which is source code control.

> I could see this as a great way to deal with editing exif data for 
> images. when checking in a .jpg, extract the .exif data and store it 
> seperately, when doing a rich checkout combine it back into the .jpg 
> file. now the large binary blob doesn't change so you don't have to try 
> and find deltas for it.
>
> all the special case things would be in the helper routines written to do 
> the 'rich checkin/checkout' of each type. people who don't care about  
> this don't enable these helpers in the configs and so don't suffer any  
> overhead (other then item (i) above)
>
> this has the potential to be horribly abused, but it also has the  
> potential to open up some very interesting possibilities as well.

I would say your example above belongs in the categry of abuses. The binary
differ can deal with exif just OK (it's not compressed IIRC), so all you need
is a custom diff driver for merging -- and that's already supported.
Compressed stuff can be already handled for the differ with clean & smudge.

-- 
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-10 14:50         ` Jan Hudec
@ 2008-08-10 17:57           ` Stephen R. van den Berg
  2008-08-10 18:11             ` Jan Hudec
  0 siblings, 1 reply; 22+ messages in thread
From: Stephen R. van den Berg @ 2008-08-10 17:57 UTC (permalink / raw)
  To: Jan Hudec
  Cc: david, Shawn O. Pearce, Scott Chacon, Jamey Sharp, Josh Triplett,
	git

Jan Hudec wrote:
>On Sun, Aug 10, 2008 at 05:16:47 -0700, david@lang.hm wrote:
>> On Sun, 10 Aug 2008, Stephen R. van den Berg wrote:
>>> However, pondering the idea a bit more, I could envision something
>>> similar to the following:

>.... provided the two entries under the same name wouldn't drive the internal
>logic completely mad, I quite like this. Note by the way, that you need to
>allow for two trees too, because you may want to store attributes for

Well, in theory yes, but currently git doesn't store directories.
How about extending git-core to allow for storage of directories by
virtue of the following object in a tree:

040000 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391  .

I.e. the hash belongs to the empty blob.
Normally you don't (have to) store these directory blobs, but if you
insist on having them, git will create the empty directory on checkout
(i.e. you wouldn't need the dummy file trick anymore to force the
directory to be present).
-- 
Sincerely,
           Stephen R. van den Berg.

Real programmers don't produce results, they return exit codes.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-10 17:57           ` Stephen R. van den Berg
@ 2008-08-10 18:11             ` Jan Hudec
  2008-08-10 20:16               ` Stephen R. van den Berg
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Hudec @ 2008-08-10 18:11 UTC (permalink / raw)
  To: Stephen R. van den Berg
  Cc: david, Shawn O. Pearce, Scott Chacon, Jamey Sharp, Josh Triplett,
	git

[-- Attachment #1: Type: text/plain, Size: 2040 bytes --]

On Sun, Aug 10, 2008 at 19:57:35 +0200, Stephen R. van den Berg wrote:
> Jan Hudec wrote:
> >On Sun, Aug 10, 2008 at 05:16:47 -0700, david@lang.hm wrote:
> >> On Sun, 10 Aug 2008, Stephen R. van den Berg wrote:
> >>> However, pondering the idea a bit more, I could envision something
> >>> similar to the following:
> 
> >.... provided the two entries under the same name wouldn't drive the internal
> >logic completely mad, I quite like this. Note by the way, that you need to
> >allow for two trees too, because you may want to store attributes for
> 
> Well, in theory yes, but currently git doesn't store directories.

It depends. It does store directories in the tree objects, it just does not
do that in index. And we are talking about tree objects, where git does store
directories.

Besides, that is irrelevant to storing attributes for directories -- the
attribute objects are not themselves directories, so git would store them
just fine.

> How about extending git-core to allow for storage of directories by
> virtue of the following object in a tree:
> 
> 040000 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391  .
> 
> I.e. the hash belongs to the empty blob.

Sorry, but this is insane. If git was to store anything for empty
directories, it would be empty tree, not a tree containing empty blob called
'.'. There was even a prototype patch to do that sent to the list (I believe
it was from Linus and was part of an argument along the lines "you could do
it like this, so stop talking and finish it if you have good enough reason to
want it (which you obviously don't)").

> Normally you don't (have to) store these directory blobs, but if you
> insist on having them, git will create the empty directory on checkout
> (i.e. you wouldn't need the dummy file trick anymore to force the
> directory to be present).

No, I don't give a damn about directories themselves. I want to store their
attributes, which is completely different thing.

-- 
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-10 18:11             ` Jan Hudec
@ 2008-08-10 20:16               ` Stephen R. van den Berg
  2008-08-10 22:34                 ` Junio C Hamano
  2008-08-16  6:21                 ` Josh Triplett, Jamey Sharp
  0 siblings, 2 replies; 22+ messages in thread
From: Stephen R. van den Berg @ 2008-08-10 20:16 UTC (permalink / raw)
  To: Jan Hudec
  Cc: david, Shawn O. Pearce, Scott Chacon, Jamey Sharp, Josh Triplett,
	git

Jan Hudec wrote:
>On Sun, Aug 10, 2008 at 19:57:35 +0200, Stephen R. van den Berg wrote:
>> Jan Hudec wrote:
>If git was to store anything for empty
>directories, it would be empty tree, not a tree containing empty blob called
>'.'. There was even a prototype patch to do that sent to the list (I believe

Ok, sounds reasonable.

With respect to the storage inside the tree, using a duplicate name with
mode 0 or a name with some kind of rare extension...
It should probably be investigated how much of the existing core needs
to be touched/changed to support the duplicate name.
I agree that using a custom rare extension would allow for almost no
change to git-core.
-- 
Sincerely,
           Stephen R. van den Berg.

Real programmers don't produce results, they return exit codes.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-10 20:16               ` Stephen R. van den Berg
@ 2008-08-10 22:34                 ` Junio C Hamano
  2008-08-10 23:10                   ` david
  2008-08-16  6:21                 ` Josh Triplett, Jamey Sharp
  1 sibling, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2008-08-10 22:34 UTC (permalink / raw)
  To: Stephen R. van den Berg
  Cc: Jan Hudec, david, Shawn O. Pearce, Scott Chacon, Jamey Sharp,
	Josh Triplett, git

"Stephen R. van den Berg" <srb@cuci.nl> writes:

> I agree that using a custom rare extension would allow for almost no
> change to git-core.

And at that point there is no "plumbing" side change necessary.  You just
have to teach your Porcelain to notice the associated "metainfo" files and
deal with them.

For merging such "metainfo", you would need to do your "flattish/unrich"
checkout anyway, so it might be that an easier approach for such a
Porcelain might be:

 * Define a specific leading path, say ".attrs" the hierarchy to store the
   attributes information.  Attributes to a file README and t/Makefile
   will be stored in .attrs/README and .attrs/t/Makefile.  They are
   probably just plain text file you can do your merges and parsing easily
   but with this counterproposal the only requirement is they are simple
   plain blobs.  The plumbing layer does not care what payload they carry.

 * When you want to "git setattr $path", the Porcelain mucks with
   ".attr/$path".  Probably checkout codepath would give you a hook that
   lets you reflect what ".attr/$path" records to "$path", and checkin
   (i.e. not commit but update-index) codepath would have another hook to
   let you grab attributes for "$path" and update ".attr/$path".

 * Merging and handling updates to ".attrs/" hierarchy are done the usual
   way we handle blobs.  Your Porcelain would then take the result and do
   whatever changes to ACL or xattrs to the corresponding path, perhaps
   from a hook after merge.

So it will most likely boild down to a "Porcelain only" convention that
different Porcelains would agree on.

My reaction for the initial proposal was very similar to the one given by
Shawn.  I do not see much point on having plumbing side support (yet).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-10 22:34                 ` Junio C Hamano
@ 2008-08-10 23:10                   ` david
  2008-08-11 10:11                     ` Stephen R. van den Berg
  0 siblings, 1 reply; 22+ messages in thread
From: david @ 2008-08-10 23:10 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stephen R. van den Berg, Jan Hudec, Shawn O. Pearce, Scott Chacon,
	Jamey Sharp, Josh Triplett, git

On Sun, 10 Aug 2008, Junio C Hamano wrote:

>> I agree that using a custom rare extension would allow for almost no
>> change to git-core.
>
> And at that point there is no "plumbing" side change necessary.  You just
> have to teach your Porcelain to notice the associated "metainfo" files and
> deal with them.
>
> For merging such "metainfo", you would need to do your "flattish/unrich"
> checkout anyway, so it might be that an easier approach for such a
> Porcelain might be:
>
> * Define a specific leading path, say ".attrs" the hierarchy to store the
>   attributes information.  Attributes to a file README and t/Makefile
>   will be stored in .attrs/README and .attrs/t/Makefile.  They are
>   probably just plain text file you can do your merges and parsing easily
>   but with this counterproposal the only requirement is they are simple
>   plain blobs.  The plumbing layer does not care what payload they carry.
>
> * When you want to "git setattr $path", the Porcelain mucks with
>   ".attr/$path".  Probably checkout codepath would give you a hook that
>   lets you reflect what ".attr/$path" records to "$path", and checkin
>   (i.e. not commit but update-index) codepath would have another hook to
>   let you grab attributes for "$path" and update ".attr/$path".
>
> * Merging and handling updates to ".attrs/" hierarchy are done the usual
>   way we handle blobs.  Your Porcelain would then take the result and do
>   whatever changes to ACL or xattrs to the corresponding path, perhaps
>   from a hook after merge.
>
> So it will most likely boild down to a "Porcelain only" convention that
> different Porcelains would agree on.
>
> My reaction for the initial proposal was very similar to the one given by
> Shawn.  I do not see much point on having plumbing side support (yet).

a few items

convienience

1. tieing the attributes to the file more directly will make it much 
easier to deal with them along with the file in the non-rich checkout 
(it's much easier to say README* then README .attr/README*)

consisntancy

2. putting hooks into the plumbing that can call external programs for the 
rich checkin/checkout will let all porcelains make use of the features 
without having to modify all of them independanty.

safety

3. when doing checkins/checkouts of individual files you need to be sure 
that you deal with the correct attributes at the same time (or else that 
the person is explicity requesting only a piece of it) with the attributes 
closely associated with the file this is much easier to do (this is 
another aspect of the convienience in #1 above)

4. if the configuration of what helper to use changes from one revision to 
another the plumbing (which is already looking at the tree object for both 
revisions) is in a better position to detect and alert then the porcelains

David Lang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-10 23:10                   ` david
@ 2008-08-11 10:11                     ` Stephen R. van den Berg
  0 siblings, 0 replies; 22+ messages in thread
From: Stephen R. van den Berg @ 2008-08-11 10:11 UTC (permalink / raw)
  To: david
  Cc: Junio C Hamano, Jan Hudec, Shawn O. Pearce, Scott Chacon,
	Jamey Sharp, Josh Triplett, git

david@lang.hm wrote:
>On Sun, 10 Aug 2008, Junio C Hamano wrote:
>>* Define a specific leading path, say ".attrs" the hierarchy to store the
>>  attributes information.  Attributes to a file README and t/Makefile

>1. tieing the attributes to the file more directly will make it much 
>easier to deal with them along with the file in the non-rich checkout 
>(it's much easier to say README* then README .attr/README*)

I have to agree that from a practical standpoint for the user, having
the file and the attribute tree right next to each other in the tree is
a lot easier to manage.

So even though setting up a shadow attribute tree is cleaner because it
doesn't need some kind of magic extension, it tends to clutter the
management in the flat-file checkout case.
-- 
Sincerely,
           Stephen R. van den Berg.

"Beware: In C++, your friends can see your privates!"

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-10 20:16               ` Stephen R. van den Berg
  2008-08-10 22:34                 ` Junio C Hamano
@ 2008-08-16  6:21                 ` Josh Triplett, Jamey Sharp
  2008-08-16  7:56                   ` david
                                     ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Josh Triplett, Jamey Sharp @ 2008-08-16  6:21 UTC (permalink / raw)
  To: Jan Hudec, Shawn O. Pearce, Stephen R. van den Berg,
	Junio C Hamano, david, Scott Chacon
  Cc: git

We want to reply to a few of the common points raised in this thread
first, and then we have a few point-by-point replies later in this mail.
In particular, we see two common questions: whether Git should include
support for metadata such as permissions and ownership at all, and
how Git should store this metadata if so.

We agree entirely with Jan Hudec's first point:

On Sun, Aug 10, 2008 at 01:09:25PM +0200, Jan Hudec wrote:
> I am glad you came up with this, as I think this is the only reasonable way
> to support things like etckeeper. The metastore and similar solutions are
> a kludge and fall apart in so many cases.

Metastore, etckeeper, and other existing "hook-based" solutions, which
attempt to handle permissions separately, have several fundamental
problems.  They do not integrate well with the normal Git workflow, they
often have race conditions that can lead to security problems, and they
store working-copy permissions separately from the filesystem
permissions where they can potentially become out of sync.

We want to emphasize that we really don't have a preference amongst the
various reasonable proposals for storing object metadata or for
presenting that metadata in porcelain.  We're happy that our proposal
stimulated discussion on the topic and that we now understand relevant
Git internals much better.  We made our proposal and test case to
demonstrate that we're willing to design and implement a solution, not
just complain that Git does not support permissions.

Among the proposals mentioned in this thread, we see some common
requirements:

- All of the proposals suggest referencing the properties from the tree
  containing the object they apply to, rather than creating an
  extra object to store both hashes together.  We originally thought
  that having a single object reference in the tree would make it easier
  to iterate over the tree, construct each object, and apply its
  permissions.  However, several of the proposals address that in other
  ways.

- Several proposals suggest storing the metadata as a tree object,
  rather than a custom "props" object.  This makes a lot of sense.  It
  allows Git to use existing logic for parsing, reachability
  checking, merging, and checkouts.  On the other hand, we want to
  optimize for the common cases such as POSIX permissions and ownership
  rather than the unusual cases like extended attributes, so it might
  make sense to store all the metadata for a particular object as a
  single blob.

- Several responses expressed concerns about merges and conflicts.  We
  propose implementing support for this in plumbing the same way Git
  does for everything else: put entries into the index with stages
  marked.  This works whether metadata storage uses a tree or a blob.
  Porcelains can choose how to resolve these merges and present
  conflicts to the user for resolution.

- Several proposals suggest using a magic suffix or special mode to
  distinguish object file entries from their metadata entries.  Either
  of these approaches seem fine.  In the case of a suffix, we think it
  makes the most sense to use '/' or "//" in this suffix; any other
  suffix would potentially conflict with legitimate filenames.  "//" has
  the advantage of working unambiguously in the index as well.  Either
  way, any porcelain on top of this could choose a different naming
  scheme for non-"rich" checkouts, or check out the properties as a
  separate top-level directory as Junio proposed.

- Several people complained about our initial proposal of printable
  ASCII for property names and values.  We used this approach solely
  because it seemed like a reasonable starting place.  Length-prefixed
  binary would work fine and provide 8-bit cleanness, as would the
  proposals that store properties as trees of blobs.

On Sun, Aug 10, 2008 at 01:09:25PM +0200, Jan Hudec wrote:
> Advantages (+), disadvantages (-) and possible (*) extensions of 1:
>  
>  + It should be possible to get to something useful with very little changes
>    to git. Basically all it needs to be useful for things like etckeeper is
>    to:
>     . Make sure both clean and smudge filter always get filehandle to the
>       disk file in question (I am /not/ suggesting path as the file may be
>       written in a staging area and moved into place later).
>     . Pass the blob id currently in index to the clean filter, so it can
>       maintain the data if they are not representable in this particular
>       checkout (eg. when checking out such repo on windows). Note, that this
>       would also be useful for ignoring insignificant changes, eg. when
>       a in some config file order is not important and the tool using it
>       randomly changes that order when changing that file.

It might prove possible to implement a reasonable and secure interface
for permissions on top of Git without standardizing the plumbing and
storage formats, true.  With enough specialized hooks, some of the
existing problems with solutions like etckeeper and metastore go away.
However, we feel that most of these solutions will have to deal with the
same problems, such as storage and merging, and the solutions will end
up re-solving problems already handled by Git plumbing.  Those who do
not understand Git's solutions are doomed to re-invent them poorly. :)

On Sat, Aug 09, 2008 at 08:51:01PM -0700, Shawn O. Pearce wrote:
> Nico and I have (at least in the past) agreed that type 0 is meant
> as an escape indicator.  If the type is set to 0 then the real type
> code appears in another byte of data which follows the object's
> inflated length.
>
> That leaves only type 5 available.
[...]
> So yea, there really aren't any new type bits available.

If consensus opinion was that new object types were a reasonable way to
solve this problem, then it sounds as if there's plenty of room to
create new types using this escape mechanism.  As a result we found your
subsequent comments a bit confusing since they seem to say only one more
new object type can exist.

> But tossing aside the type bit argument, I'm not sure I see the
> value in adding limited arbitrary properties to names in a tree.
> How does one edit these?  How do you inspect them before you get
> a checkout, assuming they might actually have an impact on the
> checkout process?  How the hell do you merge them?

Several of those questions depend on the porcelain.  The plumbing
would provide support for adding these properties to the index,
committing them, viewing them, and doing merges in the index.  The
porcelain would handle friendly editing, application to the working
tree, and friendly merges.

> A bad idea that will only clutter up the core object model, and
> the core processing code of that object model.  Extended attributes
> aren't used that much on local filesystems, because they are hard
> to work with and suck performance wise.  Performance in Git is
> a _feature_.  It matters.  Our clean object model really helps to
> make that possible.

If you mean that our proposal seems too general, like extended
attributes, then we can't argue with that. :)  We would have no problem
with a solution that only supported the standard POSIX info found in
"stat" (permissions, ownership, times).  We just felt that such a
specific proposal would not go over well; if consensus points toward a
more specialized solution, that works fine for us too.

We actually proposed the simple name/value storage for props objects
because we primarily cared about the case of small values like
permissions, not large values like arbitrary xattrs.

On Sun, Aug 10, 2008 at 03:34:37PM -0700, Junio C Hamano wrote:
> For merging such "metainfo", you would need to do your "flattish/unrich"
> checkout anyway,

Why not just put entries into the index for each stage as merging
currently does?  You could then compare the metadata in the index with
the filesystem metadata in the "rich" checkout, and resolve the conflict
by adding the desired metadata to the index as stage 0 as usual.  You
would just need some sort of interface like "git add --metadata file" to
add the metadata for file to the index.  Alternatively, you could have
some simple wrappers to directly edit the metadata in the index, much
like the existing "git update-index --chmod" does for the execute bit.

>  * Define a specific leading path, say ".attrs" the hierarchy to store the
>    attributes information.  Attributes to a file README and t/Makefile
>    will be stored in .attrs/README and .attrs/t/Makefile.  They are
>    probably just plain text file you can do your merges and parsing easily
>    but with this counterproposal the only requirement is they are simple
>    plain blobs.  The plumbing layer does not care what payload they carry.

Using a top-level tree to store all of the permissions makes sub-trees
not stand alone; the tree sha1 of a subdirectory doesn't give you enough
information to recreate the metadata for that subdirectory.

>  * When you want to "git setattr $path", the Porcelain mucks with
>    ".attr/$path".  Probably checkout codepath would give you a hook that
>    lets you reflect what ".attr/$path" records to "$path", and checkin
>    (i.e. not commit but update-index) codepath would have another hook to
>    let you grab attributes for "$path" and update ".attr/$path".

This hook would need to provide a way to process these updates before
the blob or tree contents get put into place.  For example, if you check
out /etc/shadow, you need to apply the non-world-readable permissions
*before* you write out the contents.

> So it will most likely boild down to a "Porcelain only" convention that
> different Porcelains would agree on.
> 
> My reaction for the initial proposal was very similar to the one given by
> Shawn.  I do not see much point on having plumbing side support (yet).

We agree in principle that a sufficiently rich set of hooks might make
it possible to implement metadata outside of the Git plumbing.  However,
in practice the set of hooks necessary for complete integration seems
quite large.  Furthermore, implementing these hooks efficiently seems
difficult.  We also don't want to force people to use a non-Git
porcelain just to get support for permissions.  Finally, we think that
along with a common storage format, these porcelains will all have a
common set of problems to solve, and it seems better to solve them once
correctly in Git using code that mostly already exists.

- Josh Triplett and Jamey Sharp

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-16  6:21                 ` Josh Triplett, Jamey Sharp
@ 2008-08-16  7:56                   ` david
  2008-08-16  9:55                   ` Junio C Hamano
  2008-08-18  6:12                   ` Shawn O. Pearce
  2 siblings, 0 replies; 22+ messages in thread
From: david @ 2008-08-16  7:56 UTC (permalink / raw)
  To: Josh Triplett, Jamey Sharp
  Cc: Jan Hudec, Shawn O. Pearce, Stephen R. van den Berg,
	Junio C Hamano, Scott Chacon, git

On Fri, 15 Aug 2008, Josh Triplett wrote:

>
> - Several proposals suggest storing the metadata as a tree object,
>  rather than a custom "props" object.  This makes a lot of sense.  It
>  allows Git to use existing logic for parsing, reachability
>  checking, merging, and checkouts.  On the other hand, we want to
>  optimize for the common cases such as POSIX permissions and ownership
>  rather than the unusual cases like extended attributes, so it might
>  make sense to store all the metadata for a particular object as a
>  single blob.

ahh, but if the 'tree object' that you are storing is named file.attr and 
contains just the posix permissions and ownership, there are a very small 
number of different permutations that you will see on any one system (let 
alone in any one repository), as such the duplicates will all hash to the 
same value and be combined in storage. your rich checkout porceleans can 
cache these into a lookup table and gain performance basicly equivalent to 
defining a custom object.

in fact, I'd be willing to bet that even when extended attributes are in 
use (say SELinux tags) the number of different tree objects that would be 
used would still be pretty small.

> On Sun, Aug 10, 2008 at 03:34:37PM -0700, Junio C Hamano wrote:
>> For merging such "metainfo", you would need to do your "flattish/unrich"
>> checkout anyway,
>
> Why not just put entries into the index for each stage as merging
> currently does?  You could then compare the metadata in the index with
> the filesystem metadata in the "rich" checkout, and resolve the conflict
> by adding the desired metadata to the index as stage 0 as usual.  You
> would just need some sort of interface like "git add --metadata file" to
> add the metadata for file to the index.  Alternatively, you could have
> some simple wrappers to directly edit the metadata in the index, much
> like the existing "git update-index --chmod" does for the execute bit.

becouse the tools to work directly on the index are very limited. yes they 
can be left in the index, but then the index-manipulation tools need to 
understand every type of metadata. if it's able to be presented in the 
"flattish/unrich" mode it will work anywhere, even on operating systems 
that can't run your 'rich' tools

David Lang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-16  6:21                 ` Josh Triplett, Jamey Sharp
  2008-08-16  7:56                   ` david
@ 2008-08-16  9:55                   ` Junio C Hamano
  2008-08-16 15:07                     ` Jan Hudec
  2008-08-18  6:12                   ` Shawn O. Pearce
  2 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2008-08-16  9:55 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Jamey Sharp, Jan Hudec, Shawn O. Pearce, Stephen R. van den Berg,
	david, Scott Chacon, git

Josh Triplett <josh@freedesktop.org>, Jamey Sharp <jamey@minilop.net>
writes:

> This hook would need to provide a way to process these updates before
> the blob or tree contents get put into place.  For example, if you check
> out /etc/shadow, you need to apply the non-world-readable permissions
> *before* you write out the contents.

I think such atomicity or "checkout race problem" is irrelevant.

I'd like to make a comment on this point, even though at the moment
(especially before the real release), I am not very interested in where
this "proposal" is going.

You mention that you would resolve attribute conflicts just the same way
you would resolve contents conflicts, which in turn means that you would
check out a half-merged state with conflict markers to the working tree,
fix up the filesystem entity (both contents and presumably its attributes
like perm bits, ownership, xa and whatnot), and mark the path resolved.
Even without talking about attributes conflicts, what's your position on
the time-window during which the contents of /etc/shadow and /etc/password
have conflict markers in them?

Luckily, the markers do not have sufficient number of colons, and that
would protect your system from attempts to break into it with a phoney
username '=======' with an empty password ;-), but I think you get the
idea.  Anything that has to be in some consistent state that cannot see
conflicted state in the middle should not be merged in-place [*1*], [*2*].

So please simplify your requirements and at least drop atomicity argument.

I am _not_ fundamentally opposed to somebody who wants to use git or any
other SCM as a cooler representation of snapshots than a sequence of
tarballs.  I however would be unhappy if your design and implementation
becomes more complicated than otherwise only because you try to deal with
the atomicity issue.  IOW, if your solution would become much simpler once
you pare down the atomicity requirement, then I'd reject the more complex
variant with atomicity in any second, even though I might still find the
simpler variant that does not care about atomicity worth considering.

[Footnotes]

*1* That is why people often frown upon "using SCM to track changes of a
live system in-place", and suggest tracking source material in SCM, and
build material to deploy from the source and install into the final
destination (not limited to /etc but more often so for e.g. web server
assets) as a better practice.

*2* Also you should realize your "/etc/shadow must be non-world-readable
from the beginning" is a very application specific wish.  What if the
attribute you are trying to enforce is "this path must always be
world-readable"?  Are you going to limit this "attribute enhancements" to
what you can specify at creat(2) time only?  How would you handle "this
path must be owned by user 'www-data' (assuming root drives git)", which
would be done by creat(2) followed by chown(2)?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-16  9:55                   ` Junio C Hamano
@ 2008-08-16 15:07                     ` Jan Hudec
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Hudec @ 2008-08-16 15:07 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Josh Triplett, Jamey Sharp, Shawn O. Pearce,
	Stephen R. van den Berg, david, Scott Chacon, git

On Sat, Aug 16, 2008 at 02:55:51 -0700, Junio C Hamano wrote:
> Josh Triplett <josh@freedesktop.org>, Jamey Sharp <jamey@minilop.net>
> writes:
> 
> > This hook would need to provide a way to process these updates before
> > the blob or tree contents get put into place.  For example, if you check
> > out /etc/shadow, you need to apply the non-world-readable permissions
> > *before* you write out the contents.
> 
> I think such atomicity or "checkout race problem" is irrelevant.
> 
> I'd like to make a comment on this point, even though at the moment
> (especially before the real release), I am not very interested in where
> this "proposal" is going.
> 
> You mention that you would resolve attribute conflicts just the same way
> you would resolve contents conflicts, which in turn means that you would
> check out a half-merged state with conflict markers to the working tree,
> fix up the filesystem entity (both contents and presumably its attributes
> like perm bits, ownership, xa and whatnot), and mark the path resolved.
> Even without talking about attributes conflicts, what's your position on
> the time-window during which the contents of /etc/shadow and /etc/password
> have conflict markers in them?

Well, there are situations where conflicts can happen and situations where
they can't. So I think the solution is "don't merge in the live directory"
(applicable to other uses of version control in other kind of live copies
too).

> Luckily, the markers do not have sufficient number of colons, and that
> would protect your system from attempts to break into it with a phoney
> username '=======' with an empty password ;-), but I think you get the
> idea.  Anything that has to be in some consistent state that cannot see
> conflicted state in the middle should not be merged in-place [*1*], [*2*].
> 
> So please simplify your requirements and at least drop atomicity argument.

The atomicity requirement is real for some applications, like the etckeeper.

It should be restated in terms of moving the content to the work tree rather
than before writing it out -- the content can be written out to a staging
area, attributes applied and than moved into the tree. IIUC git already uses
a staging area during checkout, no?

> I am _not_ fundamentally opposed to somebody who wants to use git or any
> other SCM as a cooler representation of snapshots than a sequence of
> tarballs.  I however would be unhappy if your design and implementation
> becomes more complicated than otherwise only because you try to deal with
> the atomicity issue.  IOW, if your solution would become much simpler once
> you pare down the atomicity requirement, then I'd reject the more complex
> variant with atomicity in any second, even though I might still find the
> simpler variant that does not care about atomicity worth considering.

I don't think the atomicity requirement should make anything more
complicated. It is only a matter of running the hook applying the attributes
-- I think git should not define meaning of the attributes -- at the right
point during the checkout process.

> [Footnotes]
> 
> *1* That is why people often frown upon "using SCM to track changes of a
> live system in-place", and suggest tracking source material in SCM, and
> build material to deploy from the source and install into the final
> destination (not limited to /etc but more often so for e.g. web server
> assets) as a better practice.

Yes, unless you need to track the changes done in the live directory by other
software, which is the case for /etc. It is also the case for ikiwiki-based
web sites.

You still need to avoid merging in the live tree to avoid breaking it, but
git always allows you to create a separate staging tree for such tasks.

> *2* Also you should realize your "/etc/shadow must be non-world-readable
> from the beginning" is a very application specific wish.  What if the
> attribute you are trying to enforce is "this path must always be
> world-readable"?  Are you going to limit this "attribute enhancements" to
> what you can specify at creat(2) time only?  How would you handle "this
> path must be owned by user 'www-data' (assuming root drives git)", which
> would be done by creat(2) followed by chown(2)?

Yes, that does not make sense. But if you restate the requirement that the
attributes must be applied when the file becomes accessible in the work tree,
than it makes sense and is easily doable by writing the file to a temporary
location -- which is sufficiently protected if it is inside .git -- and
moving it into the tree as the last step. (The data is available inside
.git/objects and .git/packs, so they are only as well protected as the .git
dir itself is, so no restrictions as long as the file is inside .git).

Best regards,

Jan

-- 
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-16  6:21                 ` Josh Triplett, Jamey Sharp
  2008-08-16  7:56                   ` david
  2008-08-16  9:55                   ` Junio C Hamano
@ 2008-08-18  6:12                   ` Shawn O. Pearce
  2008-08-18 23:06                     ` Derek Fawcus
  2 siblings, 1 reply; 22+ messages in thread
From: Shawn O. Pearce @ 2008-08-18  6:12 UTC (permalink / raw)
  To: Josh Triplett, Jamey Sharp
  Cc: Jan Hudec, Stephen R. van den Berg, Junio C Hamano, david,
	Scott Chacon, git

Josh Triplett <josh@freedesktop.org>, Jamey Sharp <jamey@minilop.net> wrote:
> On Sat, Aug 09, 2008 at 08:51:01PM -0700, Shawn O. Pearce wrote:
> > Nico and I have (at least in the past) agreed that type 0 is meant
> > as an escape indicator.  If the type is set to 0 then the real type
> > code appears in another byte of data which follows the object's
> > inflated length.
> >
> > That leaves only type 5 available.
> [...]
> > So yea, there really aren't any new type bits available.
> 
> If consensus opinion was that new object types were a reasonable way to
> solve this problem, then it sounds as if there's plenty of room to
> create new types using this escape mechanism.

Yes, but we'd hate to see the majority of the encodings within a
pack using the escape mechanism.

So a lot of my argument here was just trying to point out that
type bits aren't free, and we need to make sure the limited ones
available are applied to the majority of the pack contents.

Adding a new type bit is a lot more than just adding it to the pack
data field.  Look at the amount of code that needed to be changed to
support gitlink in trees, and that was "reusing" the OBJ_COMMIT type.
Anytime you start poking at the core object enumeration code with
new cases there's a lot of corners that are affected.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-18  6:12                   ` Shawn O. Pearce
@ 2008-08-18 23:06                     ` Derek Fawcus
  2008-08-18 23:18                       ` Shawn O. Pearce
  2008-08-18 23:23                       ` Marcus Griep
  0 siblings, 2 replies; 22+ messages in thread
From: Derek Fawcus @ 2008-08-18 23:06 UTC (permalink / raw)
  To: git

On Sun, Aug 17, 2008 at 11:12:36PM -0700, Shawn O. Pearce wrote:
> Adding a new type bit is a lot more than just adding it to the pack
> data field.  Look at the amount of code that needed to be changed to
> support gitlink in trees, and that was "reusing" the OBJ_COMMIT type.
> Anytime you start poking at the core object enumeration code with
> new cases there's a lot of corners that are affected.

Actually,  I'd been thinking of how to attach metadata - but more from
the perspective of attaching it to commits,  rather than individual
blobs or trees.

At the moment,  my workaround is simply to add well known lines to
the end of the commit comments,  the downside being that it makes
the comments a bit ugly,  and one needs to know the protocol for
parsing them.

My other hacky thought was that tag object could be overloaded for
this purpose.  It is already sort of an indirect object,  but seems
to be limited to appearing at the edge of the graph.

If we could say have:

  commit -> tag -> tree

then arbitrary data could be stored in the tag,  similarly this
could be extended for when a tree or blob object is expected
(I'm not sure about the blob case).

I guess there'd have to be some rule - like only one indirect
object allowed to be inserted (otherwise its awkward to check
for loops),  and there would need to be some custom merge rules.

DF

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-18 23:06                     ` Derek Fawcus
@ 2008-08-18 23:18                       ` Shawn O. Pearce
  2008-08-18 23:23                       ` Marcus Griep
  1 sibling, 0 replies; 22+ messages in thread
From: Shawn O. Pearce @ 2008-08-18 23:18 UTC (permalink / raw)
  To: Derek Fawcus; +Cc: git

Derek Fawcus <dfawcus@cisco.com> wrote:
> On Sun, Aug 17, 2008 at 11:12:36PM -0700, Shawn O. Pearce wrote:
> > Adding a new type bit is a lot more than just adding it to the pack
> > data field.  Look at the amount of code that needed to be changed to
> > support gitlink in trees, and that was "reusing" the OBJ_COMMIT type.
> > Anytime you start poking at the core object enumeration code with
> > new cases there's a lot of corners that are affected.
> 
> Actually,  I'd been thinking of how to attach metadata - but more from
> the perspective of attaching it to commits,  rather than individual
> blobs or trees.
>
> At the moment,  my workaround is simply to add well known lines to
> the end of the commit comments,

We've talked about adding additional header lines to the commit after
the "committer" or "encoding" line but before the first blank line
that ends the headers and starts the message.  Most of the code will
skip over an unknown header at this position, as we went through
that pain when we added the "encoding" header to the commit format.

However, once you start putting headers into there one has to
actually understand what they mean.  And it gets really ugly if
your tool thinks "fixed XXX\n" means something different from what
my tool thinks "fixed YYY\n" means and I use my tool against a
clone of your repository.  In other words there is no concept of
"header namespaces".

Thus far I don't think anyone has really tried adding more headers
here because nobody has come up with a concrete example of how it
is useful.

> I guess there'd have to be some rule - like only one indirect
> object allowed to be inserted (otherwise its awkward to check
> for loops),  and there would need to be some custom merge rules.

Loops aren't possible.  If you can create a loop you have a very
real and very valid attack against SHA-1.  You will probably be
able to use that in some way that profits you better than a loop
within some random Git repository.

You may also want to look into the "notes" idea floated on the list
in the past.  It allowed attaching trees (IIRC) to any commit, and
finding that later on in O(1) time during say git-log.  This can
be useful to attach a build report or a test report to a commit
hours after it was created.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-18 23:06                     ` Derek Fawcus
  2008-08-18 23:18                       ` Shawn O. Pearce
@ 2008-08-18 23:23                       ` Marcus Griep
  2008-08-18 23:28                         ` Shawn O. Pearce
  1 sibling, 1 reply; 22+ messages in thread
From: Marcus Griep @ 2008-08-18 23:23 UTC (permalink / raw)
  To: Derek Fawcus; +Cc: Git Mailing List

Derek Fawcus wrote:
> My other hacky thought was that tag object could be overloaded for
> this purpose.  It is already sort of an indirect object,  but seems
> to be limited to appearing at the edge of the graph.
> 
> If we could say have:
> 
>   commit -> tag -> tree
> 
> then arbitrary data could be stored in the tag,  similarly this
> could be extended for when a tree or blob object is expected
> (I'm not sure about the blob case).

I was under the impression that tags were references to commit objects,
and they to tree objects:

tag -> commit -> tree

Also, wouldn't this require large numbers tags, or the ability to multi-
target tags?

-- 
Marcus Griep
GPG Key ID: 0x5E968152
——
http://www.boohaunt.net
את.ψο´

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-18 23:23                       ` Marcus Griep
@ 2008-08-18 23:28                         ` Shawn O. Pearce
  0 siblings, 0 replies; 22+ messages in thread
From: Shawn O. Pearce @ 2008-08-18 23:28 UTC (permalink / raw)
  To: Marcus Griep; +Cc: Derek Fawcus, Git Mailing List

Marcus Griep <neoeinstein@gmail.com> wrote:
> I was under the impression that tags were references to commit objects,
> and they to tree objects:
> 
> tag -> commit -> tree

No.  A tag can reference any object.  See for example the
junio-gpg-pub tag in git.git, it references a blob, not a commit.
The linux-2.6.git tree has a tag which references a tree.

Tags may also reference other tags.

> Also, wouldn't this require large numbers tags, or the ability to multi-
> target tags?

Tag objects don't have to have names in the repository's ref space,
but it helps that they do when you are doing git-lost-found.
Having a tag in the database which shouldn't have a ref name in
refs/tags is more than a bit funny.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC] Plumbing-only support for storing object metadata
  2008-08-09 21:07 [RFC] Plumbing-only support for storing object metadata Jamey Sharp, Josh Triplett
  2008-08-09 21:49 ` Scott Chacon
@ 2008-08-10 11:09 ` Jan Hudec
  1 sibling, 0 replies; 22+ messages in thread
From: Jan Hudec @ 2008-08-10 11:09 UTC (permalink / raw)
  To: Jamey Sharp, Josh Triplett, git

Hello All,

I am glad you came up with this, as I think this is the only reasonable way
to support things like etckeeper. The metastore and similar solutions are
a kludge and fall apart in so many cases.

I am not sure your approach is the right one, though. I tend to agree with
Shawn it's not. So here is a couple of alternate proposals (sorry, it's a bit
long, as I have several variants with different drawbacks I would like to
discuss).

On Sat, Aug 09, 2008 at 14:07:33 -0700, Jamey Sharp wrote:
> The attached test illustrates a proposal for minimal plumbing support
> usable to store permissions, ownership, and other metadata in git
> repositories. This proposal is fully compatible with existing
> repositories when the new functionality is not in use. Similar to the
> introduction of subprojects, we have not yet specified the porcelain. We
> believe that the plumbing will provide sufficient functionality for many
> uses, and these uses will help determine the appropriate porcelain.

I think the main way to use it would be a hook, that would read/write the
attributes to/from the tree. That will do the right thing for storing
permissions, owners and other things represented in the worktree. And
metadata that are neither part of the tree or directly related to git's
functionality are out of our scope.

> [...]
> We propose representing objects with metadata using a new "inode"
> object. An inode object contains the hash of the real object and the
> hash of a "props" (properties) object. A props object contains a set of
> name-value pairs. Tree objects can reference inode objects in addition
> to the current possibilities of blobs, trees, and subproject commits; we
> propose using the currently invalid type 110000 (S_IFREG | S_IFIFO) for
> inode objects. We primarily see a use case for inodes referencing blobs
> and trees, though as defined they support any object type.

I think this is the overly complex -- and also the needlessly incompatible
part. By the way, I don't think you need separate type for props -- it can be
a blob too.

I would suggest investigating following options:

 1. It would be possible to use clean/smudge filters to encode the attributes
    in the blob itself.
 2. Store the metadata in separate objects, but link them in the parent tree
    directly. In this case, each attribute could probably get it's own blob,
    so eg. for a file foo the tree containing it would have entries:
      foo
      foo<sep>owner
      foo<sep>permissions
      ...
    Where <sep> would be some sepatator (more on that below).

Advantages (+), disadvantages (-) and possible (*) extensions of 1:
 
 + It should be possible to get to something useful with very little changes
   to git. Basically all it needs to be useful for things like etckeeper is
   to:
    . Make sure both clean and smudge filter always get filehandle to the
      disk file in question (I am /not/ suggesting path as the file may be
      written in a staging area and moved into place later).
    . Pass the blob id currently in index to the clean filter, so it can
      maintain the data if they are not representable in this particular
      checkout (eg. when checking out such repo on windows). Note, that this
      would also be useful for ignoring insignificant changes, eg. when
      a in some config file order is not important and the tool using it
      randomly changes that order when changing that file.

 - It does not support metadata for directories, but could be crossed with
   approach 2 to fix that. Git could special-case entry '.' for storing
   "content" of a directory, which would be wholly created by running the
   clean filter on a directory (I am not sure directory handles are portable,
   but running with that directory as current should be). This would not have
   the problem of approach 2 with the entry names for the metadata.

 * Default processing could be added to strip the metadata in smudge and
   re-add them from index on clean. This would require adding some marker to
   know which blobs need this treatment. I see two ways:
    . Using different file type for them. There are already two types
      pointing to blob (S_IFREG and S_IFLNK) and they are treated differently
      on read (clean) / write (smudge) from/to tree, so third type should be
      workable.
    . Using additional format. Currently a blob is encoded as
	"blob" <LF> <content>
      so maybe an extneded blob could be encoded as
	"blob extended" <LF> <content>
      without needing a special type for it. But I don't know git internals
      enough to know how easy, hard or dirty this would be.

Advantages (+), disadvantages (-) and possible (*) extensions of 2:

 + It would work the same way for directories and file, or mostly so.
 + Different metadata would be handled independently, so it would be easier
   to combine support for multiple attributes (not that I can imagine any
   sensible use beyond access lists (owner, permission, posix acl)).
 + Checking out without the hooks could easily create special metadata files,
   providing easy way to work with the attributes where they are not
   supported by the underlaying filesystem.
 - It would require reserving some names for the metadata entries. I see
   basically three ways to name the attribues:
    . Reserving some character for the separator, eg. @ or # or something
      like that. So with file foo, there would be entries:
        foo
	foo@owner
	foo@permissions
      This has following pros and cons:
       + Minimal changes to the index <-> tree logic (remember, index is
         flat and has no directory entries, so the tree writer must decide
	 to which tree each entry goes).
       + Trivially supports checking the metadata entries out as special
         files on filesystem without metadata support.
       - The character is reserved in trees that need the feature (the trees
	 that don't need it don't need to care).
      Note, that the metadata entries could have mode either S_ISREG, or
      a new one. Inclined to say S_ISREG -- we have the special name to
      distinguish them.
    . Using something that does not exist in a normalized path, ie either
      "//" or "/./". So with file foo, there would be entries:
        foo
	foo//owner
	foo//permissions
      This has following pros and cons:
       + Does not reserve any characters. Every filename is permitted even
         when the freature is used.
       - Harder on the index <-> tree logic, as it would have to not consider
	 such strings as not being directory separators.
       - Such files could not be checked out, though they could still be
	 manipulated using cat-file and update-index.
      The metadata entries could have mode either S_ISREG or a new one again.
      New mode would be sensible if it would make easier on the index <->
      tree logic (it's easier to check 3 bits than search string for
      a substring).
    . Leave the suffix for metadata entries to the hooks. This would be
      middle road between the above two:
       + Reserves as little as possible, while not complicating the index <->
         tree logic.
       + Remains easy to check out as special files where you can't run the
         hooks, though this would require some special-casing similar to
	 symlinks on Windows.
       - Would require new mode for these entries, so we know they are
	 created and consumed by the hooks rather than directly read/written
	 to the tree.

Best regards,
Jan

-- 
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2008-08-18 23:29 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-09 21:07 [RFC] Plumbing-only support for storing object metadata Jamey Sharp, Josh Triplett
2008-08-09 21:49 ` Scott Chacon
2008-08-10  3:51   ` Shawn O. Pearce
2008-08-10 11:20     ` Stephen R. van den Berg
2008-08-10 12:16       ` david
2008-08-10 14:50         ` Jan Hudec
2008-08-10 17:57           ` Stephen R. van den Berg
2008-08-10 18:11             ` Jan Hudec
2008-08-10 20:16               ` Stephen R. van den Berg
2008-08-10 22:34                 ` Junio C Hamano
2008-08-10 23:10                   ` david
2008-08-11 10:11                     ` Stephen R. van den Berg
2008-08-16  6:21                 ` Josh Triplett, Jamey Sharp
2008-08-16  7:56                   ` david
2008-08-16  9:55                   ` Junio C Hamano
2008-08-16 15:07                     ` Jan Hudec
2008-08-18  6:12                   ` Shawn O. Pearce
2008-08-18 23:06                     ` Derek Fawcus
2008-08-18 23:18                       ` Shawn O. Pearce
2008-08-18 23:23                       ` Marcus Griep
2008-08-18 23:28                         ` Shawn O. Pearce
2008-08-10 11:09 ` Jan Hudec

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).