All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rutger Nijlunsing <rutger@nospam.com>
To: Junio C Hamano <junkio@cox.net>, Linus Torvalds <torvalds@osdl.org>
Cc: "Randy.Dunlap" <rddunlap@osdl.org>,
	Ross Vandegrift <ross@jose.lug.udel.edu>,
	Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Proposal for shell-patch-format [was: Re: more git updates..]
Date: Sun, 10 Apr 2005 13:21:39 +0200	[thread overview]
Message-ID: <20050410112139.GA21496@nospam.com> (raw)
In-Reply-To: <7vhdifcbmo.fsf@assigned-by-dhcp.cox.net>

On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote:
> Listing the file paths and their sigs included in a tree to make
> a snapshot of a tree state sounds fine, and diffing two trees by
> looking at the sigs between two such files sounds fine as well.
> 
> But I am wondering what your plans are to handle renames---or
> does git already represent them?

git doesn't represent transitions (or deltas), but only state. So it's
not (much) more then a .tar file from version-management perspective;
the only difference being that a git-tree has a comment field and a
predecessor-reference, which are currently not used in determining the
'patch' between two trees.

Deltas are derived by comparing different versions and determining
the difference by reverse-engineering the differences which got us
from version A to version B.

Deltas are currently described as patch(1)es. Patches don't have the
concept of 'renaming', so even after determining that file X has been
renamed to Y, we have no container for this fact. A patch(1) only
contains local-file-edits: substitute lines by other lines.

Deltas are not needed to follow a tree; deltas are useful for merging
branches of versions, and for reviewing purposes. This is comparable
to using tar for version-management: it is very common to weekly tar
your current version of your project as a poor-mans-version management
for one-person one-project.

So what is needed is a way to represent deltas which can contain more
than only traditional patches. I would propose a simple format: 
the shell-script in a fixed-format.

Shell-patch format in EBNF:
  <shellpatch> ::= ( <comment>? <command>* )*
  <comment> ::= <commentline>+
    The comments contains the text describing the function of the
    patch following it.
  <commentline> ::= "# " <text>
  <command> ::=
    "mv " <pathname> " " <pathname> "\n" |
    "cp " <filename> " " <filename> "\n" |
    "chmod " <mode> <pathname> "\n" |
    "patch <<__UNIQUE_STRING__\n" <patch> "__UNIQUE_STRING__\n"
      (where UNIQUE_STRING must not be contained in patch)
  <filename> ::= <pathname>
    (but pointing to a file)
  <pathname> ::= a pathname relative to '.';
    escaping special characters the shell-way;
    may not contain '..'.

Example:
  # Rename file b to a1, and change a line.
  mv b a1
  patch <<__END__
  *** a1  Sun Apr 10 11:43:37 2005
  --- a2  Sun Apr 10 11:43:41 2005
  ***************
  *** 1,4 ****
    1
    2
  ! from
    3
  --- 1,4 ----
    1
    2
  ! to
    3
  __END__

Advantages:
  - ASCII!
  - a shell-patch is executable without extra tooling
  - a shell-patch is readable and therefore reviewable
  - a shell-patch is forward-compatible: a shell-patch acts
    like a patch (since patch(1) ignores garbage around patch :),
    but not backwards-compatible.
  - extensible
  - the heavy-lifting is done by 'patch'
Disadvantages:
  - no deltas for binary files

Open issues:
  - <comment> could be made more structured; maybe containing fields
    like Sujbect:, Author:, Signed-By:, certificates, ...
    (BitKeeper seems to be using "# " <field> ":" <value> "\n" lines)
  - patch(1) doesn't know any directories. Should shell-patch
    know directories? This implies commands working on directories to
    (like directory renaming, mode changing, ...). Otherwise directories
    are implicit (a file in a directories implies the existance of that
    directory). Also implies mkdir and rmdir as shell-patch commands.
  - extra commands might be useful to conserve more state(changes):
    ln -s  -- for symbolic links;
    ln     -- for hard links;
    chown  -- for permissions;
    chattr -- for storing extended attributes
    touch  -- for setting timestamps (probably creation time only,
              since mtime is something git relies on)
    ...and for the really adventurous:
    sed 's,<fromstring>,<tostring>,' -- for substitutions
      (this is something darcs supports, but which I think is too
       bothersome to use since it is difficult to reverse engineere
       from two random trees)
Why a fixed format at all?
  - This way, the executable shell-patch can be proven to be
    harmless to the machine: 'rm -rf /' is a valid shell-script,
    but not a valid shell-patch (since 'rm' is not valid command,
    random flags like '-rf' are not supported, and '/' is an absolute
    pathname.
  - A fixed format enables tooling to support such a patch format;
    for example creating the reverse-patch, merging patches (yeah,
    'cat' also merges patches...).

...what has this to do with git?  Not much and everything, depending
on how you look onto it. 'git' is 'tar', and 'shell-patch' is 'patch';
both orthogonal concepts but very usable in combination. We'll look at
getting from two git trees to a shell-patch.

Diffing the trees would not only look at the file and per file at the
hashes, but also the other way around: which hash values are used more
than once. For files with the same hash value, compare the contents
(and rest of attributes); this is needed since the mapping from file
contents to sha1 is one-way. When the contents is the same, the
shell-patch-command to generate is obviously a 'cp'.

For example, we have got two trees in git (pathname -> hash value):
  tree1/file1 -> 1234
  tree1/file2 -> 4567
and
  tree2/file1 -> 3456
  tree2/file3 -> 4567
  tree2/file4 -> 4567

..this could generate shell-patch:

  # Comments-go-here
  mv tree2/file2 tree2/file3
  cp tree2/file3 tree2/file4
  patch tree1/file1 <<__FILE_PATCH__
  (patch-goes-here)
  __FILE_PATCH__

...by an algorithm which starts by determining all renames, then all
copies, and finally all patches.

Comments?


-- 
Rutger Nijlunsing ---------------------- linux-kernel at tux.tmfweb.nl
never attribute to a conspiracy which can be explained by incompetence
----------------------------------------------------------------------

  parent reply	other threads:[~2005-04-10 11:21 UTC|newest]

Thread overview: 194+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-09 19:45 more git updates Linus Torvalds
2005-04-09 19:56 ` Linus Torvalds
2005-04-09 20:07 ` Petr Baudis
2005-04-09 21:00   ` Linus Torvalds
2005-04-09 21:00     ` tony.luck
2005-04-10 16:01       ` Linus Torvalds
2005-04-12 17:34         ` Helge Hafting
2005-04-10 18:19       ` Paul Jackson
2005-04-10 23:04         ` Bernd Eckenfels
2005-04-11  9:27           ` Anton Altaparmakov
2005-04-09 21:08     ` Linus Torvalds
2005-04-09 23:31       ` Linus Torvalds
2005-04-10  2:41         ` Petr Baudis
2005-04-10 16:27           ` [ANNOUNCE] git-pasky-0.1 Petr Baudis
2005-04-10 16:55             ` Linus Torvalds
2005-04-10 19:49               ` Sean
2005-04-10 17:33             ` Ingo Molnar
2005-04-10 17:42               ` Willy Tarreau
2005-04-10 17:45                 ` Ingo Molnar
2005-04-10 18:45                   ` Petr Baudis
2005-04-10 19:13                     ` Willy Tarreau
2005-04-10 21:27                       ` Petr Baudis
2005-04-10 20:38                     ` Linus Torvalds
2005-04-10 21:39                       ` Linus Torvalds
2005-04-10 23:49                         ` Petr Baudis
2005-04-10 22:27                       ` Petr Baudis
2005-04-10 23:10                         ` Linus Torvalds
2005-04-10 23:26                           ` Petr Baudis
2005-04-10 23:46                             ` Linus Torvalds
2005-04-10 23:56                               ` Petr Baudis
2005-04-11  0:20                                 ` GIT license (Re: Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.1) Linus Torvalds
2005-04-11  0:27                                   ` Petr Baudis
2005-04-11  7:45                                   ` Ingo Molnar
2005-04-11  8:40                                     ` Florian Weimer
2005-04-11 10:52                                       ` Petr Baudis
2005-04-11 16:05                                         ` Florian Weimer
2005-04-10 23:23                         ` [ANNOUNCE] git-pasky-0.1 Paul Jackson
2005-04-11  0:15                           ` Randy.Dunlap
2005-04-11  0:30                       ` Re: " Petr Baudis
2005-04-11  1:11                         ` Linus Torvalds
2005-04-10 20:41                     ` Paul Jackson
2005-04-11  1:58             ` [ANNOUNCE] git-pasky-0.2 Petr Baudis
2005-04-11  2:46               ` Daniel Barkalow
2005-04-11 10:17                 ` Petr Baudis
2005-04-11  8:50               ` Ingo Molnar
2005-04-11 10:16                 ` Petr Baudis
2005-04-11 13:57               ` [ANNOUNCE] git-pasky-0.3 Petr Baudis
2005-04-12 12:47                 ` Martin Schlemmer
2005-04-12 13:02                   ` Petr Baudis
2005-04-12 13:13                     ` Martin Schlemmer
2005-04-12 13:23                       ` Petr Baudis
     [not found]                         ` <1113375277.23299.25.camel@nosferatu.lan>
     [not found]                           ` <20050413075441.GD16489@pasky.ji.cz>
     [not found]                             ` <1113381672.23299.47.camel@nosferatu.lan>
     [not found]                               ` <20050413092656.GO16489@pasky.ji.cz>
     [not found]                                 ` <1113394537.23299.51.camel@nosferatu.lan>
2005-04-13 22:19                                   ` Re: Re: Remove need to untrack before tracking new branch Petr Baudis
2005-04-14  6:55                                     ` Martin Schlemmer
2005-04-14  8:28                                       ` Martin Schlemmer
2005-04-14  8:38                                         ` Martin Schlemmer
2005-04-14  9:11                                           ` Petr Baudis
2005-04-14  9:40                                             ` Martin Schlemmer
2005-04-14  9:55                                               ` Martin Schlemmer
2005-04-14 22:35                                                 ` Alex Riesen
2005-04-15  5:45                                                   ` Martin Schlemmer
2005-04-15  6:42                                                     ` Paul Jackson
2005-04-15 23:49                                                     ` Re: Re: " Alex Riesen
2005-04-14 22:42                                               ` Petr Baudis
2005-04-14 23:01                                                 ` Martin Schlemmer
2005-04-14 23:00                                                   ` Petr Baudis
2005-04-14 23:09                                                     ` Martin Schlemmer
2005-04-14 23:25                                                       ` Martin Schlemmer
2005-04-12 13:07                 ` [ANNOUNCE] git-pasky-0.3 David Woodhouse
2005-04-13  8:47                   ` Russell King
2005-04-13  8:59                     ` Petr Baudis
2005-04-13  9:06                       ` H. Peter Anvin
2005-04-13  9:09                         ` David Woodhouse
2005-04-13  9:25                       ` David Woodhouse
2005-04-13  9:42                         ` Petr Baudis
2005-04-13 10:24                           ` David Woodhouse
2005-04-13 17:01                           ` Daniel Barkalow
2005-04-13 18:07                             ` Petr Baudis
2005-04-13 18:22                               ` git mailing list (Re: Re: Re: Re: [ANNOUNCE] git-pasky-0.3) Linus Torvalds
2005-04-13 18:38                               ` Re: Re: Re: [ANNOUNCE] git-pasky-0.3 Daniel Barkalow
2005-04-13 12:43                         ` Xavier Bestel
2005-04-13 16:48                           ` H. Peter Anvin
2005-04-13 18:15                             ` Xavier Bestel
2005-04-13 23:05                           ` bd
2005-04-13 14:38                         ` Linus Torvalds
2005-04-13 14:47                           ` David Woodhouse
2005-04-13 14:59                             ` Linus Torvalds
2005-04-13  9:35                 ` Russell King
2005-04-13  9:38                   ` Russell King
2005-04-13  9:49                     ` Petr Baudis
2005-04-13 11:02                       ` Ingo Molnar
2005-04-13 14:50                         ` Linus Torvalds
2005-04-13  9:46                   ` Petr Baudis
2005-04-13 10:28                     ` Russell King
2005-04-13 19:03                   ` Russell King
2005-04-13 19:13                     ` Petr Baudis
2005-04-13 19:21                       ` Russell King
2005-04-13 19:23                         ` H. Peter Anvin
2005-04-10  6:53         ` more git updates Christopher Li
2005-04-10 11:48           ` Ralph Corderoy
2005-04-10 19:23           ` Paul Jackson
2005-04-10 18:42             ` Christopher Li
2005-04-10 22:30               ` Petr Baudis
2005-04-11 13:58           ` H. Peter Anvin
2005-04-20 20:29             ` Kai Henningsen
2005-04-24  0:42               ` Paul Jackson
2005-04-24  1:29                 ` Bernd Eckenfels
2005-04-24  4:13                   ` Paul Jackson
2005-04-24  4:38                     ` Bernd Eckenfels
2005-04-24  4:53                       ` Paul Jackson
2005-04-25 11:57                       ` Theodore Ts'o
2005-04-25 16:40                         ` David Wagner
2005-04-25 20:35                         ` Bernd Eckenfels
2005-04-24 16:52                   ` Horst von Brand
2005-04-24  8:00                 ` Kai Henningsen
     [not found]               ` <6f6293f10504210220744af114@mail.gmail.com>
2005-04-24  8:01                 ` Kai Henningsen
2005-04-11 11:35         ` [rfc] git: combo-blobs Ingo Molnar
2005-04-11 14:45           ` Paul Jackson
2005-04-11 15:12             ` Ingo Molnar
2005-04-11 15:32               ` Linus Torvalds
2005-04-11 15:39                 ` Ingo Molnar
2005-04-11 15:57                   ` Ingo Molnar
2005-04-11 16:01                   ` Linus Torvalds
2005-04-11 16:33                     ` Ingo Molnar
2005-04-12  5:42                       ` Barry K. Nathan
2005-04-11 18:13                     ` Chris Wedgwood
2005-04-11 18:30                       ` Linus Torvalds
2005-04-11 20:18                         ` Linus Torvalds
2005-04-11 18:40                       ` Petr Baudis
2005-04-11 17:50               ` Paul Jackson
2005-04-11 15:28             ` Ingo Molnar
2005-04-11 15:31               ` Ingo Molnar
2005-04-12  4:05         ` more git updates David Eger
2005-04-12  8:16           ` Petr Baudis
2005-04-12 20:44             ` David Eger
2005-04-12 21:21               ` Linus Torvalds
2005-04-12 22:29                 ` Krzysztof Halasa
2005-04-12 22:49                   ` Linus Torvalds
2005-04-13  4:32                     ` Matthias Urlichs
2005-04-12 22:36                 ` David Eger
2005-04-12 23:48                   ` Panagiotis Issaris
2005-04-12 23:40                 ` Andrea Arcangeli
2005-04-12 23:45                   ` Linus Torvalds
2005-04-13  0:14                     ` Andrea Arcangeli
2005-04-13  1:10                       ` Linus Torvalds
2005-04-13 10:59                         ` Andrea Arcangeli
2005-04-13 20:44                         ` Matt Mackall
2005-04-13 23:42                           ` Krzysztof Halasa
2005-04-14  0:13                             ` Matt Mackall
2005-04-13  9:30                     ` Russell King
2005-04-13 10:20                       ` Andrea Arcangeli
2005-04-13 14:43                       ` Linus Torvalds
2005-04-10  2:07     ` Paul Jackson
2005-04-10  2:20       ` Paul Jackson
2005-04-10  2:09     ` Paul Jackson
2005-04-10  7:51     ` Junio C Hamano
2005-04-10  5:53       ` Christopher Li
2005-04-10  9:28         ` Junio C Hamano
2005-04-10  7:06           ` Christopher Li
2005-04-10 11:38             ` tony.luck
2005-04-10  9:48           ` Petr Baudis
2005-04-10  9:40         ` Wichert Akkerman
2005-04-10  9:41         ` Petr Baudis
2005-04-10  7:09           ` Christopher Li
2005-04-10 11:21       ` Rutger Nijlunsing [this message]
2005-04-10 15:44       ` Linus Torvalds
2005-04-10 17:00         ` Rutger Nijlunsing
2005-04-10 18:50         ` Paul Jackson
2005-04-10 20:57           ` Linus Torvalds
2005-04-10 19:03             ` Christopher Li
2005-04-10 22:38               ` Linus Torvalds
2005-04-10 19:53                 ` Christopher Li
2005-04-10 23:21                   ` Linus Torvalds
2005-04-10 21:28                     ` Christopher Li
2005-04-12  5:14                       ` David Lang
2005-04-12  6:00                         ` Paul Jackson
2005-04-12  7:05                         ` Barry K. Nathan
2005-04-11  6:57                 ` bert hubert
2005-04-11  7:20                   ` Christer Weinigel
2005-04-10 23:14             ` Paul Jackson
2005-04-10 23:38               ` Linus Torvalds
2005-04-11  0:19                 ` Paul Jackson
2005-04-11 15:49                 ` Randy.Dunlap
2005-04-11 18:30                   ` Petr Baudis
2005-04-11  0:10               ` Petr Baudis
2005-04-09 22:00 ` Paul Jackson
2005-04-09 23:21 ` Ralph Corderoy
2005-04-10  0:39   ` Paul Jackson
2005-04-10  1:14     ` Bernd Eckenfels
2005-04-10  1:33       ` Paul Jackson
2005-04-10 10:22     ` Ralph Corderoy
2005-04-10 17:30       ` Paul Jackson
2005-04-10 17:31 ` Rik van Riel
2005-04-10 17:35   ` Ingo Molnar
2005-04-11 16:46 ` ross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050410112139.GA21496@nospam.com \
    --to=rutger@nospam.com \
    --cc=junkio@cox.net \
    --cc=linux-kernel@tux.tmfweb.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rddunlap@osdl.org \
    --cc=ross@jose.lug.udel.edu \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.