Git development
 help / color / mirror / Atom feed
* Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: Martin Uecker @ 2005-04-20 13:24 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <2cfc403205042005116484231c@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1997 bytes --]

On Wed, Apr 20, 2005 at 10:11:10PM +1000, Jon Seymour wrote:
> On 4/20/05, Linus Torvalds <torvalds@osdl.org> wrote:
> > 
> > 
> > I converted my git archives (kernel and git itself) to do the SHA1 hash
> > _before_ the compression phase.
> > 
> 
> Linus,
>  
>  Am I correct to understand that with this change, all the objects in
> the database are still being compressed (so no net performance benefit
> now), but by doing the SHA1 calculations before compression you are
> keeping open the possibility that at some point in the future you may
> use a different compression technique (including none at all) for some
> or all of the objects?

The main point is not about trying different compression
techniques but that you don't need to compress at all just
to calculate the hash of some data. (to know if it is
unchanged for example)

There are still some other design decisions I am worried
about:

The storage method of the database of a collection of
files in the underlying file system. Because of the
random nature of the hashes this leads to a horrible
amount of seeking for all operations which walk the
logical structure of some tree stored in the database.

Why not store all objects linearized in one or more
flat file?


The other thing I don't like is the use of a sha1
for a complete file. Switching to some kind of hash
tree would allow to introduce chunks later. This has
two advantages:

It would allow git to scale to repositories of large
binary files. And it would allow to build a very cool
content transport algorithm for those repositories.
This algorithm could combine all the advantages of
bittorrent and rsync (without the cpu load).


And it would allow trivial merging of patches which
apply to different chunks of a file in exact the same
way as merging changesets which apply to different
files in a tree.


Martin

-- 
One night, when little Giana from Milano was fast asleep,
she had a strange dream.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* simplify Makefile
From: Andre Noll @ 2005-04-20 12:19 UTC (permalink / raw)
  To: git

Use a generic rule for executables that depend only on the corresponding
.o and on $(LIB_FILE).

Signed-Off-By: Andre Noll <maan@systemlinux.org>
---

Makefile |   49 ++-----------------------------------------------
 1 files changed, 2 insertions(+), 47 deletions(-)

Makefile: cd299f850679b2456e360d3aa6a2d529855ba7a5
--- a/Makefile
+++ b/Makefile
@@ -34,62 +34,17 @@ LIBS= $(LIB_FILE) -lssl -lz
 
 init-db: init-db.o
 
-update-cache: update-cache.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o update-cache update-cache.o $(LIBS)
-
-show-diff: show-diff.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o show-diff show-diff.o $(LIBS)
-
-write-tree: write-tree.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o write-tree write-tree.o $(LIBS)
-
-read-tree: read-tree.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o read-tree read-tree.o $(LIBS)
-
-commit-tree: commit-tree.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o commit-tree commit-tree.o $(LIBS)
-
-cat-file: cat-file.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o cat-file cat-file.o $(LIBS)
-
 fsck-cache: fsck-cache.o $(LIB_FILE) object.o commit.o tree.o blob.o
 	$(CC) $(CFLAGS) -o fsck-cache fsck-cache.o $(LIBS)
 
-checkout-cache: checkout-cache.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o checkout-cache checkout-cache.o $(LIBS)
-
-diff-tree: diff-tree.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o diff-tree diff-tree.o $(LIBS)
-
 rev-tree: rev-tree.o $(LIB_FILE) object.o commit.o tree.o blob.o
 	$(CC) $(CFLAGS) -o rev-tree rev-tree.o $(LIBS)
 
-show-files: show-files.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o show-files show-files.o $(LIBS)
-
-check-files: check-files.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o check-files check-files.o $(LIBS)
-
-ls-tree: ls-tree.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o ls-tree ls-tree.o $(LIBS)
-
 merge-base: merge-base.o $(LIB_FILE) object.o commit.o tree.o blob.o
 	$(CC) $(CFLAGS) -o merge-base merge-base.o $(LIBS)
 
-merge-cache: merge-cache.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o merge-cache merge-cache.o $(LIBS)
-
-unpack-file: unpack-file.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o unpack-file unpack-file.o $(LIBS)
-
-git-export: git-export.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o git-export git-export.o $(LIBS)
-
-diff-cache: diff-cache.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o diff-cache diff-cache.o $(LIBS)
-
-convert-cache: convert-cache.o $(LIB_FILE)
-	$(CC) $(CFLAGS) -o convert-cache convert-cache.o $(LIBS)
+%: %.o $(LIB_FILE)
+	$(CC) $(CFLAGS) -o $@ $< $(LIBS)
 
 blob.o: $(LIB_H)
 cat-file.o: $(LIB_H)

^ permalink raw reply

* Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: Jon Seymour @ 2005-04-20 12:11 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504200144260.6467@ppc970.osdl.org>

On 4/20/05, Linus Torvalds <torvalds@osdl.org> wrote:
> 
> 
> I converted my git archives (kernel and git itself) to do the SHA1 hash
> _before_ the compression phase.
> 

Linus,
 
 Am I correct to understand that with this change, all the objects in
the database are still being compressed (so no net performance benefit
now), but by doing the SHA1 calculations before compression you are
keeping open the possibility that at some point in the future you may
use a different compression technique (including none at all) for some
or all of the objects?

jon.

[ reposted to list, because list post was bounced because of rich text
formatting ]

^ permalink raw reply

* Re: [darcs-devel] Darcs and git: plan of action
From: David Roundy @ 2005-04-20 11:55 UTC (permalink / raw)
  To: Ray Lee; +Cc: Tupshin Harper, Kevin Smith, git, darcs-devel
In-Reply-To: <1113959503.29444.91.camel@orca.madrabbit.org>

On Tue, Apr 19, 2005 at 06:11:43PM -0700, Ray Lee wrote:
> > second patch:
> > replace ./hello.c [A-Za-z_0-9] world universe
> 
> Aha! Okay, I now see at least part of issue: we're using different
> definitions of 'token.' Yours is quite sensible, in that it matches the
> darcs syntax. However, I'm claiming a token is defined by the file's
> language, and that a replace patch on anything but a token as per those
> language standards is a silly thing.

The trouble is that a token based on language standards is also wrong,
unless your file at all times is syntactically correct.  It also means (for
C in particular) that the result of the token replace isn't uniquely
determined by the combination of the token replace patch and the file it
applies to, since you need parse any header files in order to tokenize the
C file.  In the case of header files, it may not be possible to tokenize
them uniquely, since they may tokenize differently depending on what other
header files are included before them.  And of course, none of this may be
possible if you haven't run autoconf and configure, since you may not
actually *have* the header files in the first place...

In a (reasonably) general-purpose tool like darcs, I think it's better to
stick with a simpler definition of token that doesn't require a complete
integrated development environment.

It's also true that often you want to modify headers and string contents
simultaneously with the change of the code itself.  When I replace
get_pseudowavefunction with get_atomic_orbital, I also want to modify

// We call get_pseudowavefunction to get the atomic orbital...

and

printf("Error in get_pseudowavefunction!\n");
-- 
David Roundy
http://www.darcs.net

^ permalink raw reply

* Re: [darcs-devel] Darcs and git: plan of action
From: David Roundy @ 2005-04-20 11:29 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: darcs-devel, Git Mailing List
In-Reply-To: <7i4qe3x8ig.fsf@lanthane.pps.jussieu.fr>

On Tue, Apr 19, 2005 at 02:20:55PM +0200, Juliusz Chroboczek wrote:
> [Removing Linus from CC, keeping the Git list -- or should we remove it?]

I think leaving much of this on git would be appropriate, since there are
issues of how to relate to git that should be relevant.

> > If we do it right (automatically tagging like crazy people), darcs
> > users between themselves can cherry-pick all they like, without
> > introducing inconsistencies or losing interoperability with git.
> 
> You've lost me here.  How can you cherry-pick if every tag depends on
> the preceding patches?  Or are you thinking of pulling just the patch
> and not the tag -- in that case, what happens when you push to git a
> Darcs patch that depends on a patch that originated with git?

Yes, I'm thinking of pulling patches from one darcs repo to another.  If we
cherry-pick in this way, we need to create a "git-tag" for each patch that
we pull without its associated tag.  To git, this would look like two
separate changes that have the same commit log, except that they have
different parents and different commiters and commit dates.

I don't think this will be a problem for git, and since darcs will
recognize the two patches as the identical darcs patch (we'll need to put
somewhere in the git commit log a magic word indicating that this patch
originated in darcs), there won't be a problem for darcs either.

In case I haven't been clear (which seems likely), the scenario is that
darcs user 1 makes the following changes to his darcs version of a
git-based repository:

changes in 1: A -> B
tags in 1:    A1   B1

Darcs user 2 wants B, but not A, and didn't do any development:

changes in 2: B
tags in 2:    B2

User 2 pushes to git, and now git has (where P is the parent of both of the
above):

git:
P -> B/B2  (where B/B2 is the commit log with B2 as "committer info" and B
            as the "author info and long comment)

User 1 pushes (everything) to git and merges the two (patch M, which has
two parents, B1 and B2:

git:

   ->B/B2---------
  /               \
P--> A/A1 -> B/B1---> M

It's a little lame, and if user 2 doesn't do any real work, the git-using
person might be annoyed, but I think it's doable.
-- 
David Roundy
http://www.darcs.net

^ permalink raw reply

* Re: [darcs-devel] Darcs and git: plan of action
From: David Roundy @ 2005-04-20 11:18 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Juliusz Chroboczek, darcs-devel, Git Mailing List
In-Reply-To: <20050419122518.GD12757@pasky.ji.cz>

On Tue, Apr 19, 2005 at 02:25:18PM +0200, Petr Baudis wrote:
> Dear diary, on Tue, Apr 19, 2005 at 02:20:55PM CEST, I got a letter
> where Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr> told me that...
> > > The problem is that there is no sequence of alien versions that one
> > > can differentiate.  Git has a branched history, with each version
> > > that follows a merge having multiple parents.
> > 
> > Yep.  I've just realised that this morning.  Is there some notion of
> > ``primary parent'' as in Arch?  Can a changeset have 0 parents?
> 
> Yes, the root commit. Usually, there is only one, but there may be
> multiple of them theoretically.

Incidentally (and completely off-topic for this thread), wouldn't there be
a sha1 tree hash corresponding to a completely empty directory, and
couldn't one use that as the parent for the root? Would there be any reason
to do so? Just a silly thought...
-- 
David Roundy
http://www.darcs.net

^ permalink raw reply

* Re: Darcs and git: plan of action
From: David Roundy @ 2005-04-20 11:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, darcs-devel
In-Reply-To: <Pine.LNX.4.58.0504190943060.19286@ppc970.osdl.org>

On Tue, Apr 19, 2005 at 09:49:12AM -0700, Linus Torvalds wrote:
> On Tue, 19 Apr 2005, Tupshin Harper wrote:
> > I suspect that any use of wildcards in a new format would be impossible
> > for darcs since it wouldn't allow darcs to construct dependencies,
> > though I'll leave it to david to respond to that.
> 
> Note that git _does_ very efficiently (and I mean _very_) expose the 
> changed files.
> 
> So if this kind of darcs patch is always the same pattern just repeated
> over <n> files, then you really don't need to even list the files at all.
> Git gives you a very efficient file listing by just doing a "diff-tree"
> (which does not diff the _contents_ - it really just gives you a pretty
> much zero-cost "which files changed" listing).

The catch is that it's possible to have a darcs patch that doesn't change
any files, or that affects files without changing them.  If I rename
function foo to bar, I might want to do

darcs replace foo bar *.c

which would issue a replace on all files, which means that when this patch
is merged with any patches that add occurrences of foo in a file, that will
get modified to a bar, regardless of whether there was previously an
occurrence of foo in that file.

I think we might (when working with git--it would be problematic within
darcs straight) be able to work out some sort of a wildcard replace
scheme, so it could be something like

replace foo bar in: mm/*.c

The regexp bit could be left out, if we restrict the definition of "tokens"
in token replaces--which probably isn't a troublesome limitation.  By
default darcs uses two tokenizing schemes, one which allows "." in tokens
(usually relevant in Makefiles), and one which doesn't, and basically
matches C identifiers.  We could allow for both of these if we had a second
option:

replace filename foo.h bar.h in: mm/*.c

We'd just need to expand the wildcards when translating from the git
repository into darcs patches.

> So that combination would be 100% reliable _if_ you always split up darcs 
> patches to "common elements". 
> 
> And note that there does not have to be a 1:1 relationship between a git
> commit and a darcs patch. For example, say that you have a darcs patch
> that does a combination of "change token x to token y in 100 files" and
> "rename file a into b". I don't know if you do those kind of "combination 
> patches" at all, but if you do, why not just split them up into two? That 
> way the list of files changed _does_ 100% determine the list of files for 
> the token exchange.

We do allow multiple sorts of changes (in darcs terminology, multiple
"primitive patches") in a single patch.

One *could* have multiple git commits for a single darcs patch, but that
seems ugly and I'd rather avoid it.  In my view, revision control system is
more about communication than history (which is why by default, darcs
doesn't "do" history), and grouping changes together is how we express
which changes "go together".  Of course, we could still have a grouping at
a higher level, so that a single "changeset" could consist of multiple git
commits (for example by recognizing that identical commit logs mean that
it's a single change), but that adds a layer of complexity that I'd like to
avoid if possible.
-- 
David Roundy
http://www.darcs.net

^ permalink raw reply

* Re: [ANNOUNCEMENT] /Arch/ embraces `git'
From: Miles Bader @ 2005-04-20 10:19 UTC (permalink / raw)
  To: Tom Lord; +Cc: gnu-arch-users, gnu-arch-dev, talli, git, torvalds
In-Reply-To: <200504201000.DAA04988@emf.net>

Way to go.

-Miles
-- 
Do not taunt Happy Fun Ball.

^ permalink raw reply

* Re: git-viz tool for visualising commit trees
From: Ingo Molnar @ 2005-04-20 10:08 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, oliv__a
In-Reply-To: <20050417194818.GG1461@pasky.ji.cz>


* Petr Baudis <pasky@ucw.cz> wrote:

>   Hi,
> 
>   just FYI, Olivier Andrieu was kind enough to port his monotone-viz 
> tool to git (http://oandrieu.nerim.net/monotone-viz/ - use the one 
> from the monotone repository). The tool visualizes the history flow 
> nicely; see
> 
> 	http://rover.dkm.cz/~pasky/gitviz1.png
> 	http://rover.dkm.cz/~pasky/gitviz2.png
> 	http://rover.dkm.cz/~pasky/gitviz3.png
> 	http://rover.dkm.cz/~pasky/gitviz4.png
> 	http://rover.dkm.cz/~pasky/gitviz5.png
> 	http://rover.dkm.cz/~pasky/gitviz6.png
> 	http://rover.dkm.cz/~pasky/gitviz7.png
> 
> for some screenshots.

really nice stuff! Any plans to include it in git-pasky, via 'git gui' 
option or so? Also, which particular version has this included - the 
freshest tarball on the monotone-viz download site doesnt seem to 
include it.

	Ingo

^ permalink raw reply

* Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: Ingo Molnar @ 2005-04-20 10:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, Git Mailing List, Chris Mason
In-Reply-To: <Pine.LNX.4.58.0504200144260.6467@ppc970.osdl.org>


* Linus Torvalds <torvalds@osdl.org> wrote:

> So to convert your old git setup to a new git setup, do the following:
> [...]

did this for two repositories (git and kernel-git), it works as 
advertised.

	Ingo

^ permalink raw reply

* Re: wit 0.0.3 - a web interface for git available
From: Christian Meder @ 2005-04-20 10:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Greg KH, git
In-Reply-To: <20050420094253.GB29910@infradead.org>

On Wed, 2005-04-20 at 10:42 +0100, Christoph Hellwig wrote:
> On Tue, Apr 19, 2005 at 09:18:29PM -0700, Greg KH wrote:
> > On Wed, Apr 20, 2005 at 02:29:11AM +0200, Christian Meder wrote:
> > > Hi,
> > > 
> > > ok it's starting to look like spam ;-)
> > > 
> > > I uploaded a new version of wit to http://www.absolutegiganten.org/wit
> > 
> > Why not work together with Kay's tool:
> > 	http://ehlo.org/~kay/gitweb.pl?project=linux-2.6&action=show_log
> 
> That one looks really nice.  One major feature I'd love to see would
> be a show all diffs link for a changeset.

Hi, 

wit only has "show all diffs" right now but I like the show file diffs
of Kay's tool. I'll implement it tonight ;-)


			Christian


-- 
Christian Meder, email: chris@absolutegiganten.org

The Way-Seeking Mind of a tenzo is actualized 
by rolling up your sleeves.

                (Eihei Dogen Zenji)


^ permalink raw reply

* [ANNOUNCEMENT] /Arch/ embraces `git'
From: Tom Lord @ 2005-04-20 10:00 UTC (permalink / raw)
  To: gnu-arch-users, gnu-arch-dev, git; +Cc: talli, torvalds


`git', by Linus Torvalds, contains some very good ideas and some
very entertaining source code -- recommended reading for hackers.

/GNU Arch/ will adopt `git':

>From the /Arch/ perspective: `git' technology will form the
basis of a new archive/revlib/cache format and the basis
of new network transports.

>From the `git' perspective, /Arch/ will replace the lame "directory
cache" component of `git' with a proper revision control system.

In my view, the core ideas in `git' are quite profound and deserve
an impeccable implementation.   This is practical because those ideas
are also pretty simple.

I started here:

   http://www.seyza.com/=clients/linus/tree/index.html

and for those interested in `git'-theory, a good place to start is

   http://www.seyza.com/=clients/linus/tree/src/liblob/index.html

(Linus is not literally a "client" of mine.  That's just the directory 
where this goes.)

-t

^ permalink raw reply

* [ANNOUNCEMENT] /Arch/ embraces `git'
From: Tom Lord @ 2005-04-20  9:58 UTC (permalink / raw)
  To: gnu-arch-users, gnu-arch-dev, git; +Cc: talli, torvalds


`git', by Linus Torvalds, contains some very good ideas and some
very entertaining source code -- recommended reading for hackers.

/GNU Arch/ will adopt `git':

>From the /Arch/ perspective: `git' technology will form the
basis of a new archive/revlib/cache format and the basis
of new network transports.

>From the `git' perspective, /Arch/ will replace the lame "directory
cache" component of `git' with a proper revision control system.

In my view, the core ideas in `git' are quite profound and deserve
an impeccable implementation.   This is practical because those ideas
are also pretty simple.

I started here:

   http://www.seyza.com/=clients/linus/tree/index.html

and for those interested in `git'-theory, a good place to start is

   http://www.seyza.com/=clients/linus/tree/src/liblob/index.html

(Linus is not literally a "client" of mine.  That's just the directory 
where this goes.)

-t

^ permalink raw reply

* [PATCH] Give better default modes to merge results.
From: Junio C Hamano @ 2005-04-20  9:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

As shipped, the example git-merge-one-file-script often leaves
the merge result with not-so-useful mode bits, especially with
glibc 2.0.7 or later whose mkstemp() creates temporary file with
mode 0600.  This contradicts the way checkout-cache creates new
files, which is to use 0666 (or 0777 for files with executable
bit on) and let the umask mechanism to take care of adjusting it
to the user's preference.

This patch fixes this problem by (1) passing the executable bits
for 3 stages from merge-cache to the merge script, and by (2)
adjusting the example script to make use of that information.

For backward compatibility with existing merge-one-file-script
people may already have developed, the additional 3 arguments
are passed after the filename (i.e. as $5, $6 and $7).  This
does not logically look so nice, but the older scripts can and
would just ignore these new parameters.

The patch also fixes some shell quoting problems the original
sample script had with the resulting filename "$4".  Unlike all
the other arguments, this must be quoted to prevent it from
being split via shell's $IFS mechanism.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 git-merge-one-file-script |   35 +++++++++++++++++++++++++++--------
 merge-cache.c             |   18 ++++++++++++++----
 2 files changed, 41 insertions(+), 12 deletions(-)

--- a/git-merge-one-file-script
+++ b/git-merge-one-file-script
@@ -6,7 +6,9 @@
 #   $2 - file in branch1 SHA1 (or empty)
 #   $3 - file in branch2 SHA1 (or empty)
 #   $4 - pathname in repository
-#
+#   $5 - original file executable bit ('x' or '-' or empty)
+#   $6 - file in branch1  executable bit ('x' or '-' or empty)
+#   $7 - file in branch2  executable bit ('x' or '-' or empty)
 #
 # Handle some trivial cases.. The _really_ trivial cases have
 # been handled already by read-tree, but that one doesn't
@@ -24,17 +26,29 @@ case "${1:-.}${2:-.}${3:-.}" in
 #
 "$1.." | "$1.$1" | "$1$1.")
 	rm -f -- "$4"
-	update-cache --remove -- "$4"
-	exit 0
+	exec update-cache --remove -- "$4"
 	;;
 
 #
 # added in one, or added identically in both
 #
 ".$2." | "..$3" | ".$2$2")
-	mv $(unpack-file "${2:-$3}") "$4"
-	update-cache --add -- "$4" ;# needs filemode fix.
-	exit 0
+
+	# This part is convoluted but necessary to get a sane
+	# default mode bits.  We let the shell to honor default
+	# umask when creating the file, and then rely on chmod +x
+	# to again honor umask.  It used to "mv" the file created
+	# in mode 0600 by unpack-file to "$4", which was almost
+	# always wrong.
+
+	tmp=$(unpack-file "${2:-$3}") &&
+	rm -f "$4" &&
+	cat "$tmp" >"$4" &&
+	case "$6$7" in
+	*x*) chmod +x "$4" ;;
+	esac &&
+	rm -f "$tmp" || exit
+	exec update-cache --add -- "$4"
 	;;
 
 #
@@ -50,11 +64,16 @@ case "${1:-.}${2:-.}${3:-.}" in
 		echo Leaving conflict merge in $src2
 		exit 1
 	fi
-	cp "$src2" "$4" && update-cache --add -- "$4" && exit 0
+	rm -f "$4" &&
+	cat "$src2" >"$4" &&
+	case "$5$6$7" in
+	*x*) chmod +x "$4" ;;
+	esac || exit
+	exec update-cache --add -- "$4"
 	;;
 
 *)
-	echo "Not handling case $1 -> $2 -> $3"
+	echo "Not handling case $1($5) -> $2($6) -> $3($7)"
 	;;
 esac
 exit 1
--- a/merge-cache.c
+++ b/merge-cache.c
@@ -4,7 +4,7 @@
 #include "cache.h"
 
 static const char *pgm = NULL;
-static const char *arguments[5];
+static const char *arguments[8];
 
 static void run_program(void)
 {
@@ -18,6 +18,9 @@ static void run_program(void)
 			    arguments[2],
 			    arguments[3],
 			    arguments[4],
+			    arguments[5],
+			    arguments[6],
+			    arguments[7],
 			    NULL);
 		die("unable to execute '%s'", pgm);
 	}
@@ -36,17 +39,24 @@ static int merge_entry(int pos, const ch
 	arguments[2] = "";
 	arguments[3] = "";
 	arguments[4] = path;
+	arguments[5] = "";
+	arguments[6] = "";
+	arguments[7] = "";
 	found = 0;
 	do {
-		static char hexbuf[4][60];
+		static char hexbuf[3][41];
+		static char xbit[3][2];
 		struct cache_entry *ce = active_cache[pos];
 		int stage = ce_stage(ce);
 
 		if (strcmp(ce->name, path))
 			break;
 		found++;
-		strcpy(hexbuf[stage], sha1_to_hex(ce->sha1));
-		arguments[stage] = hexbuf[stage];
+		strcpy(hexbuf[stage-1], sha1_to_hex(ce->sha1));
+		arguments[stage] = hexbuf[stage-1];
+		xbit[stage-1][0] = (ntohl(ce->ce_mode) & 0100) ? 'x' : '-';
+		xbit[stage-1][1] = 0;
+		arguments[stage+4] = xbit[stage-1];
 	} while (++pos < active_nr);
 	if (!found)
 		die("merge-cache: %s not in the cache", path);




^ permalink raw reply

* Re: wit 0.0.3 - a web interface for git available
From: Christoph Hellwig @ 2005-04-20  9:42 UTC (permalink / raw)
  To: Greg KH; +Cc: Christian Meder, git
In-Reply-To: <20050420041828.GA15391@kroah.com>

On Tue, Apr 19, 2005 at 09:18:29PM -0700, Greg KH wrote:
> On Wed, Apr 20, 2005 at 02:29:11AM +0200, Christian Meder wrote:
> > Hi,
> > 
> > ok it's starting to look like spam ;-)
> > 
> > I uploaded a new version of wit to http://www.absolutegiganten.org/wit
> 
> Why not work together with Kay's tool:
> 	http://ehlo.org/~kay/gitweb.pl?project=linux-2.6&action=show_log

That one looks really nice.  One major feature I'd love to see would
be a show all diffs link for a changeset.

^ permalink raw reply

* Re: wit 0.0.3 - a web interface for git available
From: Christoph Hellwig @ 2005-04-20  9:40 UTC (permalink / raw)
  To: Christian Meder; +Cc: git
In-Reply-To: <1113956951.3309.22.camel@localhost>

On Wed, Apr 20, 2005 at 02:29:11AM +0200, Christian Meder wrote:
> Hi,
> 
> ok it's starting to look like spam ;-)
> 
> I uploaded a new version of wit to http://www.absolutegiganten.org/wit

Got an url where it can be seen on a live repository?


^ permalink raw reply

* wit - demo site
From: Christian Meder @ 2005-04-20  9:11 UTC (permalink / raw)
  To: git

Hi,

thanks to my friend Frank Sattelberger I got access to a site where I
could set up a demo for wit:

http://grmso.net:8090

Couple of notes wrt why I work on another git web interface compared
with Kay's work:

* I was already experimenting and implementing for a couple of days when
Kay's tool was first announced and I didn't want to throw away my
feature set

* the Web API: wit has a different philosophy when it comes to URIs: The
stable URI mapping should translate in a straightforward fashion to
git: /blob/<sha1> /tree/<sha1>, /tree/<sha>/diff/<sha1>, etc.; no URL
parameters

* wit is more of a git view right now: it only uses git and tries to
stay close to the repository browsing paradigm (see the API issue above)

* wit provides tarballs and patches but that's an easy one for Kay

* wit looks uglier but that will hopefully change soon ;-)

* I'm a not a Perl guy

I'm still seeking feedback ;-)

Greetings,


			Christian

-- 
Christian Meder, email: chris@absolutegiganten.org

The Way-Seeking Mind of a tenzo is actualized 
by rolling up your sleeves.

                (Eihei Dogen Zenji)


^ permalink raw reply

* WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
From: Linus Torvalds @ 2005-04-20  9:08 UTC (permalink / raw)
  To: H. Peter Anvin, Git Mailing List; +Cc: Chris Mason
In-Reply-To: <42660708.60109@zytor.com>



I converted my git archives (kernel and git itself) to do the SHA1 hash 
_before_ the compression phase.

So I'll just have to publically admit that everybody who complained about 
that particular design decision was right. Oh, well.

On Wed, 20 Apr 2005, H. Peter Anvin wrote:
> Linus Torvalds wrote:
> > 
> > So I'll see if I can turn the current fsck into a "convert into
> > uncompressed format", and do a nice clean format conversion. 
> > 
> 
> Just let me know what you want to do, and I can trivially change the 
> conversion scripts I've already written to do what you want.

I actually wrote a trivial converter myself, and while I have to say that 
this object database conversion is a bit painful, the nice thing is that I 
tried very hard to make it so that the "git" programs will work with both 
a pre-conversion and a post-conversion database.

The only program where that isn't true is "fsck-cache", since fsck-cache
for obvious reasons is very very unhappy if the sha1 of a file doesn't
match what it should be. But even there, a post-conversion fsck will eat
old objects, it will just warn about a sha1 mismatch (and eventually it
will refuse to touch them).

Anyway, what this means is that you should be actually able to get my
already-converted git database even using an older version of git: fsck
will complain mightily, so don't run it.

What I've done is to just switch the SHA1 calculation and the compression
around, but I've left all other data structures in their original format,
including the low-level object details like the fact that all objects are
tagged with their type and length.

As a result, the _only_ thing that breaks is that a new object will not
have a SHA1 that matches the expectations of an old git, but since
_checking_ the SHA1 is only done by fsck, not normal operations, all
normal ops should work fine.

So to convert your old git setup to a new git setup, do the following:

 - save your old setup. Just in case. I've converted my whole kernel tree 
   this way, so it's actually tested and I felt comfortable enough with it 
   to blow the old one away, but never take risks.

 - do _not_ update to my new version first. Instead, while you still have 
   an fsck that is happy with your old archive, make sure to fsck 
   everything you have with

	fsck-cache --unreachable $(cat .git/HEAD)

   and it shouldn't complain about anything. Use "git-prune-script" to 
   remove dangling objects if you want.

   (If you read this after you already updated, no worries - everything 
   should still work. It's just a good idea to verify your old repo first)

 - update to my new git tools. checkout, build, install

 - convert your git object database with

	convert-cache $(cat .git/HEAD)

   which will give you a new head object. Just for fun, you can 
   double-check that "re-converting" that head object should always result
   in the same head object. If it doesn't, something is wrong.

 - take the new head object, and make it your new head:

	echo xxxxxx > .git/HEAD

 - run the new "fsck-cache". It should complain about "sha1 mismatch" for 
   all your old objects, and they should all be unreachable (and you 
   should have two root objects: your old root and your new root)

 - run "git-prune-script" to remove all the unreachable objects (which are 
   all old).

 - run "fsck-cache --unreachable $(cat .git/HEAD)" with the new fsck
   again, just to check that it is now quiet.

 - blow your old index file away by re-reading your HEAD tree:

	cat-file commit $(cat .git/HEAD)
	read-tree .....

 - "update-cache --refresh"

Doing this on the git repository is nearly instantaneous. Doing it on the
kernel takes maybe a minute or so, depending on how fast your machine is.

Sorry about this, but it's a hell of a lot simpler to do it now than it
will be after we have lots of users, and I've really tried to make the
conversion be as simple and painless as possible.

And while it doesn't matter right now (since git still does exactly the
same - I did the minimal changes necessary to get the new hashes, and
that's it), this _will_ allow us to notice existing objects before we
compress them, and we can now play with different compression levels
without it being horribly painful.

				Linus

^ permalink raw reply

* Re: enforcing DB immutability
From: Chris Wedgwood @ 2005-04-20  8:58 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, H. Peter Anvin, git
In-Reply-To: <20050420075320.GA22676@elte.hu>

On Wed, Apr 20, 2005 at 09:53:20AM +0200, Ingo Molnar wrote:

> so the only sensible thing the editor/tool can do when it wants to
> change the file is precisely what we want: it will copy the
> hardlinked files's contents to a new file, and will replace the old
> file with the new file - a copy on write. No accidental corruption
> of the DB's contents.

editors that have SCM smarts and know about the files different states
can do this

i really like the way this works under BK btw --- files are RO until i
do the magic thing which will do a 'bk edit' and i can then do
checkins or similar as needed (this assumes you can do per-file
deltas)

^ permalink raw reply

* Re: enforcing DB immutability
From: linux @ 2005-04-20  8:41 UTC (permalink / raw)
  To: git, linux-kernel; +Cc: mingo

[A discussion on the git list about how to provide a hardlinked file
that *cannot* me modified by an editor, but must be replaced by
a new copy.]

mingo@elte.hu wrote all of:
>>> perhaps having a new 'immutable hardlink' feature in the Linux VFS 
>>> would help? I.e. a hardlink that can only be readonly followed, and 
>>> can be removed, but cannot be chmod-ed to a writeable hardlink. That i 
>>> think would be a large enough barrier for editors/build-tools not to 
>>> play the tricks they already do that makes 'readonly' files virtually 
>>> meaningless.
>> 
>> immutable hardlinks have the following advantage: a hardlink by design 
>> hides the information where the link comes from. So even if an editor 
>> wanted to play stupid games and override the immutability - it doesnt 
>> know where the DB object is. (sure, it could find it if it wants to, 
>> but that needs real messing around - editors wont do _that_)
>
> so the only sensible thing the editor/tool can do when it wants to 
> change the file is precisely what we want: it will copy the hardlinked 
> files's contents to a new file, and will replace the old file with the 
> new file - a copy on write. No accidental corruption of the DB's 
> contents.

This is not a horrible idea, but it touches on another sore point I've
worried about for a while.

The obvious way to do the above *without* changing anything is just to
remove all write permission to the file.  But because I'm the owner, some
piece of software running with my permissions can just deicde to change
the permissions back and modify the file anyway.  Good old 7th edition
let you give files away, which could have addressed that (chmod a-w; chown
phantom_user), but BSD took that ability away to make accounting work.

The upshot is that, while separate users keeps malware from harming the
*system*, if I run a piece of malware, it can blow away every file I
own and make me unhappy.  When (notice I'm not saying "if") commercial
spyware for Linux becomes common, it can also read every file I own.

Unless I have root access, Linux is no safer *for me* than Redmondware!

Since I *do* have root access, I often set up sandbox users and try
commercial binaries in that environment, but it's a pain and laziness
often wins.  I want a feature that I can wrap in a script, so that I
can run a commercial binary in a nicely restricted enviromment.

Or maybe I even want to set up a "personal root" level, and run
my normal interactive shells in a slightly restricted enviroment
(within which I could make a more-restricted world to run untrusted
binaries).  Then I could solve the immutable DB issue by having a
"setuid" binary that would make checked-in files unwriteable at my
normal permission level.

Obviously, a fundamental change to the Unix permissions model won't
be available to solve short-term problems, but I thought I'd raise
the issue to get people thinking about longer-term solutions.

^ permalink raw reply

* Re: [darcs-devel] Darcs and git: plan of action
From: Juliusz Chroboczek @ 2005-04-20  7:55 UTC (permalink / raw)
  To: darcs-devel, git
In-Reply-To: <20050419235832.56117.qmail@web51003.mail.yahoo.com>

> We're talking about interoperating with a Git repository here,
> right?  Even if we got the metadata in there, doesn't Git have to
> understand a replace patch for things to work out?

> 0. All three are in sync to begin with.

> 1. CC creates a token-replace patch, sends the changes in normal hunk
> format to AA.

> 2. BB makes changes, sends a normal hunk patch to AA and CC.  AA will
> apply the hunk normally.  For CC the token replace might apply here
> and so the result could be different.

3. when AA and CC try to sync, they will get spurious merge conflicts.

> Isn't this a potential problem?

It is.  In a heterogeneous environment they will get spurious merge
conflicts.

                                        Juliusz

^ permalink raw reply

* Re: enforcing DB immutability
From: Ingo Molnar @ 2005-04-20  7:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, git
In-Reply-To: <20050420074948.GA22620@elte.hu>


* Ingo Molnar <mingo@elte.hu> wrote:

> > perhaps having a new 'immutable hardlink' feature in the Linux VFS 
> > would help? I.e. a hardlink that can only be readonly followed, and 
> > can be removed, but cannot be chmod-ed to a writeable hardlink. That i 
> > think would be a large enough barrier for editors/build-tools not to 
> > play the tricks they already do that makes 'readonly' files virtually 
> > meaningless.
> 
> immutable hardlinks have the following advantage: a hardlink by design 
> hides the information where the link comes from. So even if an editor 
> wanted to play stupid games and override the immutability - it doesnt 
> know where the DB object is. (sure, it could find it if it wants to, 
> but that needs real messing around - editors wont do _that_)

so the only sensible thing the editor/tool can do when it wants to 
change the file is precisely what we want: it will copy the hardlinked 
files's contents to a new file, and will replace the old file with the 
new file - a copy on write. No accidental corruption of the DB's 
contents.

(another in-kernel VFS solution would be to enforce that the files's 
name always matches the sha1 hash. So if someone edits a DB object it 
will automatically change its name. But this is complex, probably cannot 
be done atomically, and brings up other problems as well.)

	Ingo

^ permalink raw reply

* Re: [darcs-devel] Darcs and git: plan of action
From: Juliusz Chroboczek @ 2005-04-20  7:52 UTC (permalink / raw)
  To: git, darcs-devel
In-Reply-To: <1113959503.29444.91.camel@orca.madrabbit.org>

> However, I'm claiming a token is defined by the file's language, and
> that a replace patch on anything but a token as per those language
> standards is a silly thing.

Please recall the context of this discussion: getting Darcs to grok
git repositories.

You are arguing that it should be possible to design a set of
heuristics that Do The Right Thing often enough.  And you are probably
right.

But the point is immaterial as nobody has stepped up to implement in
Darcs the sort of heuristics you have in mind.  Partly because nobody
has time, but mostly because we don't like heuristics, we prefer Darcs
to remain deterministic.

So while yes, it might be possible to get about using heuristics, it
seems rather unlikely that that's what we'll do.

                                        Juliusz


^ permalink raw reply

* Re: enforcing DB immutability
From: Ingo Molnar @ 2005-04-20  7:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, git
In-Reply-To: <20050420074053.GA22436@elte.hu>


* Ingo Molnar <mingo@elte.hu> wrote:

> perhaps having a new 'immutable hardlink' feature in the Linux VFS 
> would help? I.e. a hardlink that can only be readonly followed, and 
> can be removed, but cannot be chmod-ed to a writeable hardlink. That i 
> think would be a large enough barrier for editors/build-tools not to 
> play the tricks they already do that makes 'readonly' files virtually 
> meaningless.

immutable hardlinks have the following advantage: a hardlink by design 
hides the information where the link comes from. So even if an editor 
wanted to play stupid games and override the immutability - it doesnt 
know where the DB object is. (sure, it could find it if it wants to, but 
that needs real messing around - editors wont do _that_)

i think this might work.

(the current chattr +i flag isnt quite what we need though because it 
works on the inode, and it's also a root-only feature so it puts us back 
to square one. What would be needed is an immutability flag on 
hardlinks, settable by unprivileged users.)

	Ingo

^ permalink raw reply

* [PATCH] Unify usage() strings.
From: Junio C Hamano @ 2005-04-20  7:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

This patch changes identical cut-and-paste usage strings into a
single instance of static string, to make maintenance easier.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 commit-tree.c |   14 ++++++--------
 diff-cache.c  |    6 ++++--
 diff-tree.c   |    6 ++++--
 read-tree.c   |   10 ++++++----
 4 files changed, 20 insertions(+), 16 deletions(-)



commit-tree.c: 2eee2fe5b14f1f2d86b8d41b501a879b190bf08f
--- a/commit-tree.c
+++ b/commit-tree.c
@@ -268,15 +268,13 @@ static void check_valid(unsigned char *s
 }
 
 /*
- * Having more than two parents may be strange, but hey, there's
- * no conceptual reason why the file format couldn't accept multi-way
- * merges. It might be the "union" of several packages, for example.
- *
- * I don't really expect that to happen, but this is here to make
- * it clear that _conceptually_ it's ok..
+ * Having more than two parents is not strange at all, and this is
+ * how multi-way merges are represented.
  */
 #define MAXPARENT (16)
 
+static char *commit_tree_usage = "commit-tree <sha1> [-p <sha1>]* < changelog";
+
 int main(int argc, char **argv)
 {
 	int i, len;
@@ -296,14 +294,14 @@ int main(int argc, char **argv)
 	unsigned int size;
 
 	if (argc < 2 || get_sha1_hex(argv[1], tree_sha1) < 0)
-		usage("commit-tree <sha1> [-p <sha1>]* < changelog");
+		usage(commit_tree_usage);
 
 	check_valid(tree_sha1, "tree");
 	for (i = 2; i < argc; i += 2) {
 		char *a, *b;
 		a = argv[i]; b = argv[i+1];
 		if (!b || strcmp(a, "-p") || get_sha1_hex(b, parent_sha1[parents]))
-			usage("commit-tree <sha1> [-p <sha1>]* < changelog");
+			usage(commit_tree_usage);
 		check_valid(parent_sha1[parents], "commit");
 		parents++;
 	}


diff-cache.c: 48bcec1230365e12b9fb6df65c15540caea24029
--- a/diff-cache.c
+++ b/diff-cache.c
@@ -215,6 +215,8 @@ static int diff_cache(void *tree, unsign
 	return 0;
 }
 
+static char *diff_cache_usage = "diff-cache [-r] [-z] [--cached] <tree sha1>";
+
 int main(int argc, char **argv)
 {
 	unsigned char tree_sha1[20];
@@ -239,11 +241,11 @@ int main(int argc, char **argv)
 			cached_only = 1;
 			continue;
 		}
-		usage("diff-cache [-r] [-z] <tree sha1>");
+		usage(diff_cache_usage);
 	}
 
 	if (argc != 2 || get_sha1_hex(argv[1], tree_sha1))
-		usage("diff-cache [-r] [-z] <tree sha1>");
+		usage(diff_cache_usage);
 
 	tree = tree_from_tree_or_commit(tree_sha1, type, &size);
 	if (!tree)
diff-tree.c: 8720ce75b72cdf9c8d189f9edf41e0920bd72767
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -193,6 +193,8 @@ static void commit_to_tree(unsigned char
 	}
 }
 
+static char *diff_tree_usage = "diff-tree [-r] [-z] <tree sha1> <tree sha1>";
+
 int main(int argc, char **argv)
 {
 	unsigned char old[20], new[20];
@@ -209,11 +211,11 @@ int main(int argc, char **argv)
 			line_termination = '\0';
 			continue;
 		}
-		usage("diff-tree [-r] [-z] <tree sha1> <tree sha1>");
+		usage(diff_tree_usage);
 	}
 
 	if (argc != 3 || get_sha1_hex(argv[1], old) || get_sha1_hex(argv[2], new))
-		usage("diff-tree <tree sha1> <tree sha1>");
+		usage(diff_tree_usage);
 	commit_to_tree(old);
 	commit_to_tree(new);
 	return diff_tree_sha1(old, new, "");


read-tree.c: e438579d63fb090209eaf4c864586afaeb52ae0f
--- a/read-tree.c
+++ b/read-tree.c
@@ -201,6 +201,8 @@ static void merge_stat_info(struct cache
 	}
 }
 
+static char *read_tree_usage = "read-tree (<sha> | -m <sha1> [<sha2> <sha3>])";
+
 int main(int argc, char **argv)
 {
 	int i, newfd, merge;
@@ -220,20 +222,20 @@ int main(int argc, char **argv)
 		if (!strcmp(arg, "-m")) {
 			int i;
 			if (stage)
-				usage("-m needs to come first");
+				die("-m needs to come first");
 			read_cache();
 			for (i = 0; i < active_nr; i++) {
 				if (ce_stage(active_cache[i]))
-					usage("you need to resolve your current index first");
+					die("you need to resolve your current index first");
 			}
 			stage = 1;
 			merge = 1;
 			continue;
 		}
 		if (get_sha1_hex(arg, sha1) < 0)
-			usage("read-tree [-m] <sha1>");
+			usage(read_tree_usage);
 		if (stage > 3)
-			usage("can't merge more than two trees");
+			usage(read_tree_usage);
 		if (read_tree(sha1, "", 0) < 0)
 			die("failed to unpack tree object %s", arg);
 		stage++;


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox