From: Rutger Nijlunsing <rutger@nospam.com>
To: Junio C Hamano <junkio@cox.net>, Linus Torvalds <torvalds@osdl.org>
Cc: "Randy.Dunlap" <rddunlap@osdl.org>,
Ross Vandegrift <ross@jose.lug.udel.edu>,
Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Proposal for shell-patch-format [was: Re: more git updates..]
Date: Sun, 10 Apr 2005 13:21:39 +0200 [thread overview]
Message-ID: <20050410112139.GA21496@nospam.com> (raw)
In-Reply-To: <7vhdifcbmo.fsf@assigned-by-dhcp.cox.net>
On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote:
> Listing the file paths and their sigs included in a tree to make
> a snapshot of a tree state sounds fine, and diffing two trees by
> looking at the sigs between two such files sounds fine as well.
>
> But I am wondering what your plans are to handle renames---or
> does git already represent them?
git doesn't represent transitions (or deltas), but only state. So it's
not (much) more then a .tar file from version-management perspective;
the only difference being that a git-tree has a comment field and a
predecessor-reference, which are currently not used in determining the
'patch' between two trees.
Deltas are derived by comparing different versions and determining
the difference by reverse-engineering the differences which got us
from version A to version B.
Deltas are currently described as patch(1)es. Patches don't have the
concept of 'renaming', so even after determining that file X has been
renamed to Y, we have no container for this fact. A patch(1) only
contains local-file-edits: substitute lines by other lines.
Deltas are not needed to follow a tree; deltas are useful for merging
branches of versions, and for reviewing purposes. This is comparable
to using tar for version-management: it is very common to weekly tar
your current version of your project as a poor-mans-version management
for one-person one-project.
So what is needed is a way to represent deltas which can contain more
than only traditional patches. I would propose a simple format:
the shell-script in a fixed-format.
Shell-patch format in EBNF:
<shellpatch> ::= ( <comment>? <command>* )*
<comment> ::= <commentline>+
The comments contains the text describing the function of the
patch following it.
<commentline> ::= "# " <text>
<command> ::=
"mv " <pathname> " " <pathname> "\n" |
"cp " <filename> " " <filename> "\n" |
"chmod " <mode> <pathname> "\n" |
"patch <<__UNIQUE_STRING__\n" <patch> "__UNIQUE_STRING__\n"
(where UNIQUE_STRING must not be contained in patch)
<filename> ::= <pathname>
(but pointing to a file)
<pathname> ::= a pathname relative to '.';
escaping special characters the shell-way;
may not contain '..'.
Example:
# Rename file b to a1, and change a line.
mv b a1
patch <<__END__
*** a1 Sun Apr 10 11:43:37 2005
--- a2 Sun Apr 10 11:43:41 2005
***************
*** 1,4 ****
1
2
! from
3
--- 1,4 ----
1
2
! to
3
__END__
Advantages:
- ASCII!
- a shell-patch is executable without extra tooling
- a shell-patch is readable and therefore reviewable
- a shell-patch is forward-compatible: a shell-patch acts
like a patch (since patch(1) ignores garbage around patch :),
but not backwards-compatible.
- extensible
- the heavy-lifting is done by 'patch'
Disadvantages:
- no deltas for binary files
Open issues:
- <comment> could be made more structured; maybe containing fields
like Sujbect:, Author:, Signed-By:, certificates, ...
(BitKeeper seems to be using "# " <field> ":" <value> "\n" lines)
- patch(1) doesn't know any directories. Should shell-patch
know directories? This implies commands working on directories to
(like directory renaming, mode changing, ...). Otherwise directories
are implicit (a file in a directories implies the existance of that
directory). Also implies mkdir and rmdir as shell-patch commands.
- extra commands might be useful to conserve more state(changes):
ln -s -- for symbolic links;
ln -- for hard links;
chown -- for permissions;
chattr -- for storing extended attributes
touch -- for setting timestamps (probably creation time only,
since mtime is something git relies on)
...and for the really adventurous:
sed 's,<fromstring>,<tostring>,' -- for substitutions
(this is something darcs supports, but which I think is too
bothersome to use since it is difficult to reverse engineere
from two random trees)
Why a fixed format at all?
- This way, the executable shell-patch can be proven to be
harmless to the machine: 'rm -rf /' is a valid shell-script,
but not a valid shell-patch (since 'rm' is not valid command,
random flags like '-rf' are not supported, and '/' is an absolute
pathname.
- A fixed format enables tooling to support such a patch format;
for example creating the reverse-patch, merging patches (yeah,
'cat' also merges patches...).
...what has this to do with git? Not much and everything, depending
on how you look onto it. 'git' is 'tar', and 'shell-patch' is 'patch';
both orthogonal concepts but very usable in combination. We'll look at
getting from two git trees to a shell-patch.
Diffing the trees would not only look at the file and per file at the
hashes, but also the other way around: which hash values are used more
than once. For files with the same hash value, compare the contents
(and rest of attributes); this is needed since the mapping from file
contents to sha1 is one-way. When the contents is the same, the
shell-patch-command to generate is obviously a 'cp'.
For example, we have got two trees in git (pathname -> hash value):
tree1/file1 -> 1234
tree1/file2 -> 4567
and
tree2/file1 -> 3456
tree2/file3 -> 4567
tree2/file4 -> 4567
..this could generate shell-patch:
# Comments-go-here
mv tree2/file2 tree2/file3
cp tree2/file3 tree2/file4
patch tree1/file1 <<__FILE_PATCH__
(patch-goes-here)
__FILE_PATCH__
...by an algorithm which starts by determining all renames, then all
copies, and finally all patches.
Comments?
--
Rutger Nijlunsing ---------------------- linux-kernel at tux.tmfweb.nl
never attribute to a conspiracy which can be explained by incompetence
----------------------------------------------------------------------
next prev parent reply other threads:[~2005-04-10 11:21 UTC|newest]
Thread overview: 178+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-09 19:45 more git updates Linus Torvalds
2005-04-09 19:56 ` Linus Torvalds
2005-04-09 20:07 ` Petr Baudis
2005-04-09 21:00 ` Linus Torvalds
2005-04-09 21:00 ` tony.luck
2005-04-10 16:01 ` Linus Torvalds
2005-04-12 17:34 ` Helge Hafting
2005-04-10 18:19 ` Paul Jackson
2005-04-10 23:04 ` Bernd Eckenfels
2005-04-11 9:27 ` Anton Altaparmakov
2005-04-09 21:08 ` Linus Torvalds
2005-04-09 23:31 ` Linus Torvalds
2005-04-10 2:41 ` Petr Baudis
2005-04-10 16:27 ` [ANNOUNCE] git-pasky-0.1 Petr Baudis
2005-04-10 16:55 ` Linus Torvalds
2005-04-10 19:49 ` Sean
2005-04-10 17:33 ` Ingo Molnar
2005-04-10 17:42 ` Willy Tarreau
2005-04-10 17:45 ` Ingo Molnar
2005-04-10 18:45 ` Petr Baudis
2005-04-10 19:13 ` Willy Tarreau
2005-04-10 21:27 ` Petr Baudis
2005-04-10 20:38 ` Linus Torvalds
2005-04-10 21:39 ` Linus Torvalds
2005-04-10 23:49 ` Petr Baudis
2005-04-10 22:27 ` Petr Baudis
2005-04-10 23:10 ` Linus Torvalds
2005-04-10 23:26 ` Petr Baudis
2005-04-10 23:46 ` Linus Torvalds
2005-04-10 23:56 ` Petr Baudis
2005-04-11 0:20 ` GIT license (Re: Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1) Linus Torvalds
2005-04-11 0:27 ` Petr Baudis
2005-04-11 7:45 ` Ingo Molnar
2005-04-11 8:40 ` Florian Weimer
2005-04-11 10:52 ` Petr Baudis
2005-04-11 16:05 ` Florian Weimer
2005-04-10 23:23 ` [ANNOUNCE] git-pasky-0.1 Paul Jackson
2005-04-11 0:15 ` Randy.Dunlap
2005-04-11 0:30 ` Re: " Petr Baudis
2005-04-11 1:11 ` Linus Torvalds
2005-04-10 20:41 ` Paul Jackson
2005-04-11 1:58 ` [ANNOUNCE] git-pasky-0.2 Petr Baudis
2005-04-11 2:46 ` Daniel Barkalow
2005-04-11 10:17 ` Petr Baudis
2005-04-11 8:50 ` Ingo Molnar
2005-04-11 10:16 ` Petr Baudis
2005-04-11 13:57 ` [ANNOUNCE] git-pasky-0.3 Petr Baudis
2005-04-12 12:47 ` Martin Schlemmer
2005-04-12 13:02 ` Petr Baudis
2005-04-12 13:13 ` Martin Schlemmer
2005-04-12 13:23 ` Petr Baudis
2005-04-12 13:07 ` David Woodhouse
2005-04-13 8:47 ` Russell King
2005-04-13 8:59 ` Petr Baudis
2005-04-13 9:06 ` H. Peter Anvin
2005-04-13 9:09 ` David Woodhouse
2005-04-13 9:25 ` David Woodhouse
2005-04-13 9:42 ` Petr Baudis
2005-04-13 10:24 ` David Woodhouse
2005-04-13 17:01 ` Daniel Barkalow
2005-04-13 18:07 ` Petr Baudis
2005-04-13 18:22 ` git mailing list (Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.3) Linus Torvalds
2005-04-13 18:38 ` Re: Re: Re: [ANNOUNCE] git-pasky-0.3 Daniel Barkalow
2005-04-13 12:43 ` Xavier Bestel
2005-04-13 16:48 ` H. Peter Anvin
2005-04-13 18:15 ` Xavier Bestel
2005-04-13 23:05 ` bd
2005-04-13 14:38 ` Linus Torvalds
2005-04-13 14:47 ` David Woodhouse
2005-04-13 14:59 ` Linus Torvalds
2005-04-13 9:35 ` Russell King
2005-04-13 9:38 ` Russell King
2005-04-13 9:49 ` Petr Baudis
2005-04-13 11:02 ` Ingo Molnar
2005-04-13 14:50 ` Linus Torvalds
2005-04-13 9:46 ` Petr Baudis
2005-04-13 10:28 ` Russell King
2005-04-13 19:03 ` Russell King
2005-04-13 19:13 ` Petr Baudis
2005-04-13 19:21 ` Russell King
2005-04-13 19:23 ` H. Peter Anvin
2005-04-10 6:53 ` more git updates Christopher Li
2005-04-10 11:48 ` Ralph Corderoy
2005-04-10 19:23 ` Paul Jackson
2005-04-10 18:42 ` Christopher Li
2005-04-10 22:30 ` Petr Baudis
2005-04-11 13:58 ` H. Peter Anvin
2005-04-20 20:29 ` Kai Henningsen
2005-04-24 0:42 ` Paul Jackson
2005-04-24 1:29 ` Bernd Eckenfels
2005-04-24 4:13 ` Paul Jackson
2005-04-24 4:38 ` Bernd Eckenfels
2005-04-24 4:53 ` Paul Jackson
2005-04-25 11:57 ` Theodore Ts'o
2005-04-25 16:40 ` David Wagner
2005-04-25 20:35 ` Bernd Eckenfels
2005-04-24 16:52 ` Horst von Brand
2005-04-24 8:00 ` Kai Henningsen
[not found] ` <6f6293f10504210220744af114@mail.gmail.com>
2005-04-24 8:01 ` Kai Henningsen
2005-04-11 11:35 ` [rfc] git: combo-blobs Ingo Molnar
2005-04-11 14:45 ` Paul Jackson
2005-04-11 15:12 ` Ingo Molnar
2005-04-11 15:32 ` Linus Torvalds
2005-04-11 15:39 ` Ingo Molnar
2005-04-11 15:57 ` Ingo Molnar
2005-04-11 16:01 ` Linus Torvalds
2005-04-11 16:33 ` Ingo Molnar
2005-04-12 5:42 ` Barry K. Nathan
2005-04-11 18:13 ` Chris Wedgwood
2005-04-11 18:30 ` Linus Torvalds
2005-04-11 20:18 ` Linus Torvalds
2005-04-11 18:40 ` Petr Baudis
2005-04-11 17:50 ` Paul Jackson
2005-04-11 15:28 ` Ingo Molnar
2005-04-11 15:31 ` Ingo Molnar
2005-04-12 4:05 ` more git updates David Eger
2005-04-12 8:16 ` Petr Baudis
2005-04-12 20:44 ` David Eger
2005-04-12 21:21 ` Linus Torvalds
2005-04-12 22:29 ` Krzysztof Halasa
2005-04-12 22:49 ` Linus Torvalds
2005-04-13 4:32 ` Matthias Urlichs
2005-04-12 22:36 ` David Eger
2005-04-12 23:48 ` Panagiotis Issaris
2005-04-12 23:40 ` Andrea Arcangeli
2005-04-12 23:45 ` Linus Torvalds
2005-04-13 0:14 ` Andrea Arcangeli
2005-04-13 1:10 ` Linus Torvalds
2005-04-13 10:59 ` Andrea Arcangeli
2005-04-13 20:44 ` Matt Mackall
2005-04-13 23:42 ` Krzysztof Halasa
2005-04-14 0:13 ` Matt Mackall
2005-04-13 9:30 ` Russell King
2005-04-13 10:20 ` Andrea Arcangeli
2005-04-13 14:43 ` Linus Torvalds
2005-04-10 2:07 ` Paul Jackson
2005-04-10 2:20 ` Paul Jackson
2005-04-10 2:09 ` Paul Jackson
2005-04-10 7:51 ` Junio C Hamano
2005-04-10 5:53 ` Christopher Li
2005-04-10 9:28 ` Junio C Hamano
2005-04-10 7:06 ` Christopher Li
2005-04-10 11:38 ` tony.luck
2005-04-10 9:48 ` Petr Baudis
2005-04-10 9:40 ` Wichert Akkerman
2005-04-10 9:41 ` Petr Baudis
2005-04-10 7:09 ` Christopher Li
2005-04-10 11:21 ` Rutger Nijlunsing [this message]
2005-04-10 15:44 ` Linus Torvalds
2005-04-10 17:00 ` Rutger Nijlunsing
2005-04-10 18:50 ` Paul Jackson
2005-04-10 20:57 ` Linus Torvalds
2005-04-10 19:03 ` Christopher Li
2005-04-10 22:38 ` Linus Torvalds
2005-04-10 19:53 ` Christopher Li
2005-04-10 23:21 ` Linus Torvalds
2005-04-10 21:28 ` Christopher Li
2005-04-12 5:14 ` David Lang
2005-04-12 6:00 ` Paul Jackson
2005-04-12 7:05 ` Barry K. Nathan
2005-04-11 6:57 ` bert hubert
2005-04-11 7:20 ` Christer Weinigel
2005-04-10 23:14 ` Paul Jackson
2005-04-10 23:38 ` Linus Torvalds
2005-04-11 0:19 ` Paul Jackson
2005-04-11 15:49 ` Randy.Dunlap
2005-04-11 18:30 ` Petr Baudis
2005-04-11 0:10 ` Petr Baudis
2005-04-09 22:00 ` Paul Jackson
2005-04-09 23:21 ` Ralph Corderoy
2005-04-10 0:39 ` Paul Jackson
2005-04-10 1:14 ` Bernd Eckenfels
2005-04-10 1:33 ` Paul Jackson
2005-04-10 10:22 ` Ralph Corderoy
2005-04-10 17:30 ` Paul Jackson
2005-04-10 17:31 ` Rik van Riel
2005-04-10 17:35 ` Ingo Molnar
2005-04-11 16:46 ` ross
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050410112139.GA21496@nospam.com \
--to=rutger@nospam.com \
--cc=junkio@cox.net \
--cc=linux-kernel@tux.tmfweb.nl \
--cc=linux-kernel@vger.kernel.org \
--cc=rddunlap@osdl.org \
--cc=ross@jose.lug.udel.edu \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox