File Systems and a Theory of Edits

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* File Systems and a Theory of Edits
@ 2011-07-30 14:29 Michael Nahas
  2011-07-30 19:06 ` Ævar Arnfjörð Bjarmason
  2011-07-30 19:40 ` John M. Dlugosz
  0 siblings, 2 replies; 15+ messages in thread
From: Michael Nahas @ 2011-07-30 14:29 UTC (permalink / raw)
  To: git

I've spent a month thinking about the design of git.  This is the
result of that work.

My understand of git is that, at the lower level, it stores and
communicates snapshots of a file system.  At the upper level, git
manipulates "edit"s: the change between two snapshots.

I'm proposing:
    1) Creating git commands that mimic Unix's file system commands
       and operate on those snapshots.
    2) A language for describing git's manipulations of edits
    3) Creating aliases that will allow the working tree and state
       stored in the index to be treated like file systems

I want to thank Jeff King ("Peff") for being my sounding board for the
theory of edits.

1) Snapshots
2) A Theory of Edits
3) File Systems
3.1) Merge Conflict
4) Conclusion

1) Snapshots

Commits consist of:
    1) A snapshot of the file system
    2) Some meta-information
    3) link(s) to past commit(s)

I'm only concerned about #1 here.

The way to make something both easy to learn and easy to remember is
to imitate something the user is already familiar with.  Thus my
proposal is:

PROPOSAL 1: git should imitate the Unix file system commands for
accessing the snapshot of a commit.

For these commands to work, the git command will have to include an
argument that specifies which commit it operates on.  So some basic
ones might be:
    "git ls <commit> -- <path>"
    "git cat <commit> -- <path>"
(There exists "git ls-files", "git ls-tree", and "git cat-file" but
they are not quite the same.)
    "git find <commit> -- ..."
    "git grep <commit> -- <path>" (Exists)
The Unix command "diff" compares two files/directories.  So, the "git"
version requires two commits to be specified.
    "git diff <commit> <commit> -- <path>"   (Exists)
I'd love to see something to apply a command to every file in a commit
or every file found by "git find".
    "git xargs <commit> ..."  (Is this possible?)
Since snapshots are a read-only version of a file system, git can't
implement the commands "rm", "mv", or "cp".

2) A Theory of Edits

Snapshots in the object store are easy to understand.  But to go
further, we need to be able to look at the index file and the working
tree.  During a merge conflict, the index file and working tree go
into an unusual state.  To understand that state, I spent a month
coming up with a mathematical theory to describe what git does.

Git, at the upper level, manipulates edits.  An "edit" here is a very
specific term.  It is the changes between two specific snapshots.  I'm
going to introduce some notation: if two snapshots are A and B, then
the edit is written A:B.

==============
WARNING: Great boredom ahead.  Mathematicians and theoriticians:
enjoy!  Everyone else: read as much as you can and when you start to
fall asleep, skip to the next subsection.
==============

Edits have mathematical properties.  It's easy to see that B:A is the
"inverse edit" of A:B.  The edit A:A is the "empty edit" for snapshot
A.  An edit A:B can be "split" using a snapshot C to make edits
A:C and C:B or, written another way, A:C:B.  Likewise, edits A:B and
B:C can be "joined" to form A:C.

    * The inverse edit is generated by "git revert".
    * The empty edit can be written by "git commit --allow-empty".
    * Splitting an edit will be demonstrated later using "git add".
    * Joining an edit is done "git commit --amend"

An edit has a specific start and end snapshot.  If you want to do a
similar change with a different starting snapshot, you need to
"patch"; patch() is a function that takes an edit and a new starting
state and returns the ending snapshot of a new edit.  So, patch(A:B,
C) may return D, where C:D is a new edit containing a change similar
to A:B.  I say "may return" because a patch starting at snapshot C
might not exist.  For example, if the edit A:B moves file "foo.txt" to
"bar.txt" and snapshot C does not have a file "foo.txt" nor "bar.txt",
then the patch cannot exist.  [Note, there can be many definitions of
a patch() function. I'm not picking one; I'm just saying one exists.]

    * patch() is most easily seen in "git cherry-pick"

The last definition concerns reordering edits A:B and B:C.  The edits
are reorderable if a patch of B:C can put in front of a patch for A:B
and the resulting edit still ends up at the same final snapshot
C. Formally, A:B:C is "reorderable" if there exists A:D:C such that
patch(B:C, A) = D and patch(A:B, D) = C.

    "git rebase --interactive" can reorder (and do anything else!)

PROPOSAL 2: adopt a term like edit and rigorous terms
like split, join, and reorder to describe the operations of git
commands.
We should also use exacting vocabulary to describe git commands.  It's
not unusual to use the word "commit" when referring to:
    * a snapshot  (stored in the commit's tree object)
    * an edit   (the difference between this commit's snapshot and its
                   parent's (if it has only one parent...))
    * a complete history of edits going back to the initial snapshot
    * the commit object itself (e.g., when tagged)
While often the appropriate definition can be picked up from context,
we should be precise if possible.
It would be good to define a term like "snapshot tree" that refers to
a tree object that is the root of a snapshot, to differentiate it from
other tree objects that store subdirectories.

3) File Systems : There exist snapshots outside the object store!

This statement may surprise you: The current state of the working tree
is a snapshot.  The working tree is a file system so its state at any
one point, is a snapshot of a file system.  For brevity, I'm going to
call that snapshot WTREE.  We can talk about the edit between any
other snapshot (like ones stored in a commit) and WTREE.  Usually,
we'll talk about the edit from HEAD to WTREE.

If the edit from HEAD to WTREE contains more than one feature and we
want to package each feature into its own edit, we need to split the
HEAD to WTREE edit.  And, to split an edit, we need another
snapshot...

Another surprising statement: A snapshot can be computed from the
index file and HEAD (when not in merge-conflict state).  My validation
for this statement is that at any point, we can type "commit
--allow-empty" and a new commit will be written to the current branch.
That commit contains a snapshot generated from the index file and
HEAD.  Since the snapshot computed from the index file and HEAD will
become the next commit written, I'll refer to it as NEXT.

I want to be clear that NEXT is not the index file.  NEXT is a
snapshot.  The index file is a file.  The index file (with HEAD) is
just one way to store a snapshot.  Since we can modify the files that
will go in the next commit, NEXT is actually a file system like WTREE.
Although the man page for "git add" says it "add[s] file contents to
the index", I think a better way to say it is that it copies the files
into the NEXT file system.

To recap: a common operation in git is to split the HEAD to WTREE edit
by "git add"ing files to the NEXT file system and then using "git
commit" to write a snapshot of NEXT into the object store, making the
edit permanent.  (You may want to reread that sentence a few times
until it becomes clear.)

Now, the concepts of WTREE and NEXT work most of the time.  However,
when there is a merge conflict, the index file takes on a special
state.  This is why I developed a theory around edits: I need it to
describe what happens then and why.

4) Merge Conflict

I'm going to use "git cherry-pick" for my example.  It involves
merging a single edit, so it's the easiest case.

A cherry-pick is almost a direct application of patch().  We have an
edit A:B and we want to move it onto snapshot C.  But we said earlier,
the result of a patch() function may or may not exist.

If patch(A:B, C) exists and equals D, then git just writes the
snapshot D as the new commit.

But what if patch(A:B, C) does not exist?  Git does something amazing:
it splits A:B!  So, we'll introduce a new state S to get A:S:B.  Now,
the first edit, A:S, contains all the parts of A:B that can be
patch()ed onto state C, and the second edit, S:B, contains all the
parts of A:B that cannot be patch()ed onto state C.  Obviously,
patch(A:S, C) exists and the resulting snapshot is copied into NEXT.

But what happens to the unpatchable part in edit S:B?  We don't want
this change thrown away - it could be important.  We want it presented
to the user and let the user fix or dismiss it.  If we had a GUI, the
window's border might turn red and tabs for each affected file might
open, but a command-line interface doesn't have that.  So, git writes
something reflecting the unpatachable part into files in the working
tree and marks the files as "needs review" in the index file.  (It
also caches some files SHAs in the index, but we can ignore that.)

Now that we know what happens during a conflicted merge, the question
is: do there exist any snapshots here?  We defined NEXT using the
snapshot that would go in the next commit.  But if you run "git commit
--allow-empty" during a merge conflict, you get an error!  We said the
current working tree state was a snapshot, but git just wrote
"<<<<""===="">>>>" into the files - if they did obey any syntax before
that, they certainly don't now!

The answer is unclear.  My opinion is that there's little harm in
viewing the result of patch(A:S,C) as NEXT.  (This would be "the
snapshot generated using stage 0 of the index file" to some.)  I also
think that, during a merge conflict, the working tree is in an
unsyntactic, unnatural state.  While I think there is value to always
treating the working tree as a file system, I can understand with
those who might argue that git should treat it differently during a
merge conflict.

PROPOSAL 3: Add aliases NEXT and WTREE that work in place of a
snapshot in any commands.
     e.g., "git diff HEAD NEXT"
     e.g., "git ls NEXT etc/"
During a conflicted merge state, we _may_ want commands to treat WTREE
differently.

ADDITION TO PROPOSAL 2: Since NEXT and WTREE are writeable file
systems, the Unix filesystem commands that write should be implemented
as part of git to work with them.
    "git cp <snapshot> <writeable_filesys> -- <src_path> <dest_file>"
    "git mv <snapshot> <writeable_filesys> -- <src_path> <dest_file>"
    "git rm <writeable_filesys> -- <file>"
I believe "git cp" would be similar to the proposed "git put".  The current
"git mv" and "git rm" does operation on both NEXT and WTREE by default.
(Which I think is a sensible default in those cases.)
We may want to consider "mkdir", "rmdir", "chmod".

4) Conclusion

I've proposed that git give snapshots the same interface that a Unix
command line would provide to a read-only file system.

I've presented a mathematical language for defining edits and
proposed using it and clearer words for describing git command's
operations.

I've proposed creating the aliases WTREE and NEXT and allowing them to
be used anywhere a snapshot is used and in Unix commands that operate
on a writeable filesystem.

Mike Nahas

Appendix: Features and Edits

I said above "If the edit from HEAD to WTREE contains more than one
feature and we want to package each feature into its own edit, ...".
I believe this concept of one-feature=one-edit is at the heart of good
git usage.  We want to manipulate features - add a feature to a
branch, remove a feature, etc. - but we can only manipulate edits.
So, as long as each feature is in its own edit, we can easily manipulate
it.

Unfortunately, features and edits are not the same.  We can merge two
edits, but that doesn't mean the result has both features.  For
example, consider a project that branches and one branch's feature is
a new Makefile target and the edit explicitly lists the source files.
The other branch's feature is support for a new protocol and that edit
adds a new source file.  The merge of these two branches may succeed
and contain both edits, but it doesn't have both features.

The git glossary calls a merge "evil" if the commit contains a change
that is not present in either parent.  I say that's a bad
definition.  In the example above, I think it's a good thing to edit
the merge so that the commit contains both features.

This is why I think this theory of edits has value: we want to
manipulate features and if we have one-feature=one-edit and we know
how to manipulate edits, then we can manipulate features.  I don't
think it is a new concept; I think it has been implied any number of
places; I just hope with clear terms to describe manipulation of edits
like split/join/inverse/patch/reorder that we have clearer description
of what we do.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: File Systems and a Theory of Edits
  2011-07-30 14:29 File Systems and a Theory of Edits Michael Nahas
@ 2011-07-30 19:06 ` Ævar Arnfjörð Bjarmason
  2011-07-31  8:15   ` René Scharfe
  2011-07-30 19:40 ` John M. Dlugosz
  1 sibling, 1 reply; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2011-07-30 19:06 UTC (permalink / raw)
  To: mike; +Cc: git

On Sat, Jul 30, 2011 at 16:29, Michael Nahas <mike.nahas@gmail.com> wrote:
>     "git xargs <commit> ..."  (Is this possible?)

I don't have comments on the rest of your proposal, but I've often
wanted a git-find(1) similar to git-grep(1). Which would give you this
functionality.

Then you could simply:

    git find <commit> <path> -type f | xargs <whatever>

Or something like that.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: File Systems and a Theory of Edits
  2011-07-30 19:06 ` Ævar Arnfjörð Bjarmason
@ 2011-07-31  8:15   ` René Scharfe
       [not found]     ` <CADo4Y9gU_Z73gCPCESvVZhLOJUJg+mTqHkeqpNv2L8xLJvKxEQ@mail.gmail.com>
  2011-08-01  1:04     ` Junio C Hamano
  0 siblings, 2 replies; 15+ messages in thread
From: René Scharfe @ 2011-07-31  8:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: mike, git

Am 30.07.2011 21:06, schrieb Ævar Arnfjörð Bjarmason:
> On Sat, Jul 30, 2011 at 16:29, Michael Nahas <mike.nahas@gmail.com> wrote:
>>     "git xargs <commit> ..."  (Is this possible?)
> 
> I don't have comments on the rest of your proposal, but I've often
> wanted a git-find(1) similar to git-grep(1). Which would give you this
> functionality.
> 
> Then you could simply:
> 
>     git find <commit> <path> -type f | xargs <whatever>
> 
> Or something like that.

How about this, which should match your example:

	git ls-tree -r --name-only <commit> <path> | xargs <whatever>

René

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <CADo4Y9gU_Z73gCPCESvVZhLOJUJg+mTqHkeqpNv2L8xLJvKxEQ@mail.gmail.com>]

* RE: File Systems and a Theory of Edits
       [not found]     ` <CADo4Y9gU_Z73gCPCESvVZhLOJUJg+mTqHkeqpNv2L8xLJvKxEQ@mail.gmail.com>
@ 2011-07-31 14:15       ` Michael Nahas
  2011-07-31 17:21         ` Michael Witten
  2011-07-31 16:16       ` René Scharfe
  1 sibling, 1 reply; 15+ messages in thread
From: Michael Nahas @ 2011-07-31 14:15 UTC (permalink / raw)
  To: git

Rene,
I don't doubt that there exists current commands in git that can
perform operations like cat, ls, etc.  My point is that git can make
it easier for new users to learn commands and existing users to
remember commands if git copies the name and sematics (as much as
possible) of cat, ls, etc.

Ævar,
The issue is what goes inside the xargs command.  If it is unix's cat
command, the files listed by find will be from the commit's snapshot,
but the files read by cat will be from the working tree.

I believe the solution for xargs may be John D.'s solution - to
"mount" the snapshot as a file system.  And the "mount" command in git
is "git checkout".  (Now, I almost want to rename "git checkout" to
"git remount"!)


Mike Nahas


On Sun, Jul 31, 2011 at 4:15 AM, René Scharfe
<rene.scharfe@lsrfire.ath.cx> wrote:
>
> Am 30.07.2011 21:06, schrieb Ævar Arnfjörð Bjarmason:
> > On Sat, Jul 30, 2011 at 16:29, Michael Nahas <mike.nahas@gmail.com> wrote:
> >>     "git xargs <commit> ..."  (Is this possible?)
> >
> > I don't have comments on the rest of your proposal, but I've often
> > wanted a git-find(1) similar to git-grep(1). Which would give you this
> > functionality.
> >
> > Then you could simply:
> >
> >     git find <commit> <path> -type f | xargs <whatever>
> >
> > Or something like that.
>
> How about this, which should match your example:
>
>        git ls-tree -r --name-only <commit> <path> | xargs <whatever>
>
> René

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: File Systems and a Theory of Edits
  2011-07-31 14:15       ` Michael Nahas
@ 2011-07-31 17:21         ` Michael Witten
  2011-07-31 21:13           ` Michael Nahas
  0 siblings, 1 reply; 15+ messages in thread
From: Michael Witten @ 2011-07-31 17:21 UTC (permalink / raw)
  To: mike; +Cc: git

On Sun, Jul 31, 2011 at 14:15, Michael Nahas <mike.nahas@gmail.com> wrote:
> I believe the solution for xargs may be John D.'s solution - to
> "mount" the snapshot as a file system.  And the "mount" command in git
> is "git checkout".  (Now, I almost want to rename "git checkout" to
> "git remount"!)

Why not just `git mount', though? We could have different mount points
too, so that it's easy to work with multiple `snapshots' at once (in
the spirit of bazaar and mercurial, as well).

Perhaps `git umount' could be used to make the repository bare.

In any case, I always find myself wishing that the standard interfaces
would make it easier to base an operation on a snapshot that is not
yet mounted as the working tree. It can be quite cumbersome to switch
the contents of the working tree.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: File Systems and a Theory of Edits
  2011-07-31 17:21         ` Michael Witten
@ 2011-07-31 21:13           ` Michael Nahas
  2011-07-31 22:20             ` Andreas Schwab
  2011-08-01 12:01             ` Michael Nahas
  0 siblings, 2 replies; 15+ messages in thread
From: Michael Nahas @ 2011-07-31 21:13 UTC (permalink / raw)
  To: Michael Witten; +Cc: git

Why not "git mount" indeed!

At work, I have 3 very active branches and a slow build system.  Right
now, when I switch to a new branch, I have to rebuild everything.
Being able to "git mount" 3 snapshots in 3 directories with three
different build outputs would make switching branches faster.

3 working trees would be even better.  I've been wondering if I can
make another working trees by creating a .git/ directory and
symlinking to the .git/objects and ./git/refs of my current
repository.  (I could use the environment variables GIT_INDEX_FILE and
GIT_WORKING_TREE, but that would require setting and resetting them.
Or using a different shell.)

So a true "git mount" that allowed mounting editable branches would be
very useful to me.  (Although, if it wasn't for that crappy build
system, I prefer a single working tree.)

Mike Nahas

On Sun, Jul 31, 2011 at 1:21 PM, Michael Witten <mfwitten@gmail.com> wrote:
> On Sun, Jul 31, 2011 at 14:15, Michael Nahas <mike.nahas@gmail.com> wrote:
>> I believe the solution for xargs may be John D.'s solution - to
>> "mount" the snapshot as a file system.  And the "mount" command in git
>> is "git checkout".  (Now, I almost want to rename "git checkout" to
>> "git remount"!)
>
> Why not just `git mount', though? We could have different mount points
> too, so that it's easy to work with multiple `snapshots' at once (in
> the spirit of bazaar and mercurial, as well).
>
> Perhaps `git umount' could be used to make the repository bare.
>
> In any case, I always find myself wishing that the standard interfaces
> would make it easier to base an operation on a snapshot that is not
> yet mounted as the working tree. It can be quite cumbersome to switch
> the contents of the working tree.
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: File Systems and a Theory of Edits
  2011-07-31 21:13           ` Michael Nahas
@ 2011-07-31 22:20             ` Andreas Schwab
  2011-07-31 22:39               ` Michael Nahas
  2011-08-01 12:01             ` Michael Nahas
  1 sibling, 1 reply; 15+ messages in thread
From: Andreas Schwab @ 2011-07-31 22:20 UTC (permalink / raw)
  To: mike; +Cc: Michael Witten, git

Michael Nahas <mike.nahas@gmail.com> writes:

> 3 working trees would be even better.  I've been wondering if I can
> make another working trees by creating a .git/ directory and
> symlinking to the .git/objects and ./git/refs of my current
> repository.

Have you looked at git-new-workdir?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: File Systems and a Theory of Edits
  2011-07-31 22:20             ` Andreas Schwab
@ 2011-07-31 22:39               ` Michael Nahas
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Nahas @ 2011-07-31 22:39 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: git

I shall check it out.  Thanks!  -Mike

On Sun, Jul 31, 2011 at 6:20 PM, Andreas Schwab <schwab@linux-m68k.org> wrote:
> Michael Nahas <mike.nahas@gmail.com> writes:
>
>> 3 working trees would be even better.  I've been wondering if I can
>> make another working trees by creating a .git/ directory and
>> symlinking to the .git/objects and ./git/refs of my current
>> repository.
>
> Have you looked at git-new-workdir?
>
> Andreas.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: File Systems and a Theory of Edits
  2011-07-31 21:13           ` Michael Nahas
  2011-07-31 22:20             ` Andreas Schwab
@ 2011-08-01 12:01             ` Michael Nahas
  1 sibling, 0 replies; 15+ messages in thread
From: Michael Nahas @ 2011-08-01 12:01 UTC (permalink / raw)
  To: Michael Witten; +Cc: git

Michael,

I've been thinking about it and "git mount" is the right idea.  I like
it a lot.  In fact, the most common usage of "git checkout" can be
totally replaced by "git mount".

The other usage of "git checkout -- file" can be replaced by "git cp".

Mike


On Sun, Jul 31, 2011 at 5:13 PM, Michael Nahas <mike.nahas@gmail.com> wrote:
> Why not "git mount" indeed!
>
> At work, I have 3 very active branches and a slow build system.  Right
> now, when I switch to a new branch, I have to rebuild everything.
> Being able to "git mount" 3 snapshots in 3 directories with three
> different build outputs would make switching branches faster.
>
> 3 working trees would be even better.  I've been wondering if I can
> make another working trees by creating a .git/ directory and
> symlinking to the .git/objects and ./git/refs of my current
> repository.  (I could use the environment variables GIT_INDEX_FILE and
> GIT_WORKING_TREE, but that would require setting and resetting them.
> Or using a different shell.)
>
> So a true "git mount" that allowed mounting editable branches would be
> very useful to me.  (Although, if it wasn't for that crappy build
> system, I prefer a single working tree.)
>
> Mike Nahas
>
>
> On Sun, Jul 31, 2011 at 1:21 PM, Michael Witten <mfwitten@gmail.com> wrote:
>> On Sun, Jul 31, 2011 at 14:15, Michael Nahas <mike.nahas@gmail.com> wrote:
>>> I believe the solution for xargs may be John D.'s solution - to
>>> "mount" the snapshot as a file system.  And the "mount" command in git
>>> is "git checkout".  (Now, I almost want to rename "git checkout" to
>>> "git remount"!)
>>
>> Why not just `git mount', though? We could have different mount points
>> too, so that it's easy to work with multiple `snapshots' at once (in
>> the spirit of bazaar and mercurial, as well).
>>
>> Perhaps `git umount' could be used to make the repository bare.
>>
>> In any case, I always find myself wishing that the standard interfaces
>> would make it easier to base an operation on a snapshot that is not
>> yet mounted as the working tree. It can be quite cumbersome to switch
>> the contents of the working tree.
>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: File Systems and a Theory of Edits
       [not found]     ` <CADo4Y9gU_Z73gCPCESvVZhLOJUJg+mTqHkeqpNv2L8xLJvKxEQ@mail.gmail.com>
  2011-07-31 14:15       ` Michael Nahas
@ 2011-07-31 16:16       ` René Scharfe
  1 sibling, 0 replies; 15+ messages in thread
From: René Scharfe @ 2011-07-31 16:16 UTC (permalink / raw)
  To: mike; +Cc: Michael Nahas, Ævar Arnfjörð Bjarmason, git

Am 31.07.2011 16:13, schrieb Michael Nahas:
> I don't doubt that there exists current commands in git that can perform
> operations like cat, ls, etc.  My point is that git can make it easier
> for new users to learn commands and existing users to remember commands
> if git copies the name and sematics (as much as possible) of cat, ls, etc.

Possibly.  My point was that for this example a look-alike was easy to
implement as an alias or shell script using an existing plumbing
command.  You can probably get quite far to your goal that way.

René

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: File Systems and a Theory of Edits
  2011-07-31  8:15   ` René Scharfe
       [not found]     ` <CADo4Y9gU_Z73gCPCESvVZhLOJUJg+mTqHkeqpNv2L8xLJvKxEQ@mail.gmail.com>
@ 2011-08-01  1:04     ` Junio C Hamano
  2011-08-01 11:14       ` Michael Nahas
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2011-08-01  1:04 UTC (permalink / raw)
  To: René Scharfe; +Cc: Ævar Arnfjörð Bjarmason, mike, git

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> Am 30.07.2011 21:06, schrieb Ævar Arnfjörð Bjarmason:
>> On Sat, Jul 30, 2011 at 16:29, Michael Nahas <mike.nahas@gmail.com> wrote:
>>>     "git xargs <commit> ..."  (Is this possible?)
>> 
>> I don't have comments on the rest of your proposal, but I've often
>> wanted a git-find(1) similar to git-grep(1). Which would give you this
>> functionality.
>> 
>> Then you could simply:
>> 
>>     git find <commit> <path> -type f | xargs <whatever>
>> 
>> Or something like that.
>
> How about this, which should match your example:
>
> 	git ls-tree -r --name-only <commit> <path> | xargs <whatever>

I don't get what this thread wants to achieve quite yet.

The devil is in <whatever> part. What would it do, given only the sequence
of pathnames and object names but not data?  Invoke low-level git commands
on them?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: File Systems and a Theory of Edits
  2011-08-01  1:04     ` Junio C Hamano
@ 2011-08-01 11:14       ` Michael Nahas
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Nahas @ 2011-08-01 11:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: René Scharfe, Ævar Arnfjörð, git,
	Jakub Narebski, Holger Hellmuth, Jonathan Nieder,
	Michael J Gruber, Scott Chacon

Hi Junio,

I started this thread with a very long post that rejoins the NEXT /
WTREE debate with more ammunition.  (A theory to back it up and
explain what happens during merge conflict.)

I also proposed that git treat snapshots, the working tree, and next
commit like unix file systems and support the commands from them such
as: cat, ls, find, etc.  These follow-up emails were focused on the
small problem of whether or not xargs could work.  (I'm now convince
it can't - you want to just to a checkout and then run it.)

I excepted the proposals below.

Mike



PROPOSAL 3: Add aliases NEXT and WTREE that work in place of a
snapshot in any commands.
     e.g., "git diff HEAD NEXT"
     e.g., "git ls NEXT etc/"
During a conflicted merge state, we _may_ want commands to treat WTREE
differently.


PROPOSAL 1: git should imitate the Unix file system commands for
accessing the snapshot of a commit.

For these commands to work, the git command will have to include an
argument that specifies which commit it operates on.  So some basic
ones might be:
    "git ls <commit> -- <path>"
    "git cat <commit> -- <path>"
(There exists "git ls-files", "git ls-tree", and "git cat-file" but
they are not quite the same.)
    "git find <commit> -- ..."
    "git grep <commit> -- <path>" (Exists)
The Unix command "diff" compares two files/directories.  So, the "git"
version requires two commits to be specified.
    "git diff <commit> <commit> -- <path>"   (Exists)
I'd love to see something to apply a command to every file in a commit
or every file found by "git find".
    "git xargs <commit> ..."  (Is this possible?)
Since snapshots are a read-only version of a file system, git can't
implement the commands "rm", "mv", or "cp" for them.
NEXT and WTREE are writeable file
systems, the Unix filesystem commands that write should be implemented
as part of git to work with them.
    "git cp <snapshot> <writeable_filesys> -- <src_path> <dest_file>"
    "git mv <snapshot> <writeable_filesys> -- <src_path> <dest_file>"
    "git rm <writeable_filesys> -- <file>"
I believe "git cp" would be similar to the proposed "git put".  The current
"git mv" and "git rm" does operation on both NEXT and WTREE by default.
(Which I think is a sensible default in those cases.)
We may want to consider "mkdir", "rmdir", "chmod".


PROPOSAL 2: adopt a term like edit and rigorous terms
like split, join, and reorder to describe the operations of git
commands.
We should also use exacting vocabulary to describe git commands.  It's
not unusual to use the word "commit" when referring to:
    * a snapshot  (stored in the commit's tree object)
    * an edit   (the difference between this commit's snapshot and its
                   parent's (if it has only one parent...))
    * a complete history of edits going back to the initial snapshot
    * the commit object itself (e.g., when tagged)
While often the appropriate definition can be picked up from context,
we should be precise if possible.
It would be good to define a term like "snapshot tree" that refers to
a tree object that is the root of a snapshot, to differentiate it from
other tree objects that store subdirectories.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: File Systems and a Theory of Edits
  2011-07-30 14:29 File Systems and a Theory of Edits Michael Nahas
  2011-07-30 19:06 ` Ævar Arnfjörð Bjarmason
@ 2011-07-30 19:40 ` John M. Dlugosz
  2011-07-31 11:56   ` Michael Witten
  2011-08-01  1:22   ` Jeff King
  1 sibling, 2 replies; 15+ messages in thread
From: John M. Dlugosz @ 2011-07-30 19:40 UTC (permalink / raw)
  To: git

On 7/30/2011 9:29 AM, Michael Nahas wrote:
> For these commands to work, the git command will have to include an
> argument that specifies which commit it operates on.  So some basic
> ones might be:
>      "git ls<commit>  -- <path>"
>      "git cat<commit>  -- <path>"
> (There exists "git ls-files", "git ls-tree", and "git cat-file" but

If you could "mount" a repository, then you would not need these commands at all.  It 
would be in fact a read-only file system.  Once mounted, the individual commits could be 
directories, and under that you explore in the usual way.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: File Systems and a Theory of Edits
  2011-07-30 19:40 ` John M. Dlugosz
@ 2011-07-31 11:56   ` Michael Witten
  2011-08-01  1:22   ` Jeff King
  1 sibling, 0 replies; 15+ messages in thread
From: Michael Witten @ 2011-07-31 11:56 UTC (permalink / raw)
  To: John M. Dlugosz; +Cc: git

On Sat, 30 Jul 2011 14:40:57 -0500, John M. Dlugosz wrote:

> On 7/30/2011 9:29 AM, Michael Nahas wrote:
> 
>> For these commands to work, the git command will have to include an
>> argument that specifies which commit it operates on.  So some basic
>> ones might be:
>>      "git ls<commit>  -- <path>"
>>      "git cat<commit>  -- <path>"
>> (There exists "git ls-files", "git ls-tree", and "git cat-file" but
> 
> If you could "mount" a repository, then you would not need these commands at all.  It 
> would be in fact a read-only file system.  Once mounted, the individual commits could be 
> directories, and under that you explore in the usual way.

You can do this kind of thing with Avery Pennarun's
most awesome `bup' tool, which is based on git, and
it is indeed very useful.

See the whole thread here:

 Subject: Request: Auto-joining large files by 'bup fuse'
 Message-ID: <5f12a43ec3dc4250a7672725f5c172fc-mfwitten@gmail.com>
 http://groups.google.com/group/bup-list/browse_thread/thread/f80f56981853698b

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: File Systems and a Theory of Edits
  2011-07-30 19:40 ` John M. Dlugosz
  2011-07-31 11:56   ` Michael Witten
@ 2011-08-01  1:22   ` Jeff King
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff King @ 2011-08-01  1:22 UTC (permalink / raw)
  To: John M. Dlugosz; +Cc: git

On Sat, Jul 30, 2011 at 02:40:57PM -0500, John M. Dlugosz wrote:

> On 7/30/2011 9:29 AM, Michael Nahas wrote:
> >For these commands to work, the git command will have to include an
> >argument that specifies which commit it operates on.  So some basic
> >ones might be:
> >     "git ls<commit>  -- <path>"
> >     "git cat<commit>  -- <path>"
> >(There exists "git ls-files", "git ls-tree", and "git cat-file" but
> 
> If you could "mount" a repository, then you would not need these
> commands at all.  It would be in fact a read-only file system.  Once
> mounted, the individual commits could be directories, and under that
> you explore in the usual way.

There are several (mostly fuse-based) tools listed on the wiki:

  https://git.wiki.kernel.org/index.php/InterfacesFrontendsAndTools#Filesystem_interfaces

I've never used any of them, though. No idea how mature or usable they
are.

Googling around also came up with this newer attempt:

  https://github.com/mfontani/git-fuse-perl

Again, no idea on the quality.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-08-01 12:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-30 14:29 File Systems and a Theory of Edits Michael Nahas
2011-07-30 19:06 ` Ævar Arnfjörð Bjarmason
2011-07-31  8:15   ` René Scharfe
     [not found]     ` <CADo4Y9gU_Z73gCPCESvVZhLOJUJg+mTqHkeqpNv2L8xLJvKxEQ@mail.gmail.com>
2011-07-31 14:15       ` Michael Nahas
2011-07-31 17:21         ` Michael Witten
2011-07-31 21:13           ` Michael Nahas
2011-07-31 22:20             ` Andreas Schwab
2011-07-31 22:39               ` Michael Nahas
2011-08-01 12:01             ` Michael Nahas
2011-07-31 16:16       ` René Scharfe
2011-08-01  1:04     ` Junio C Hamano
2011-08-01 11:14       ` Michael Nahas
2011-07-30 19:40 ` John M. Dlugosz
2011-07-31 11:56   ` Michael Witten
2011-08-01  1:22   ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).