Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 2/3] add new Git::Repo API
From: Jakub Narebski @ 2008-07-17 23:49 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git, John Hawley, Petr Baudis
In-Reply-To: <487E5AD3.60202@gmail.com>

On Wed, 16 July 2008 at 22:32 (13:32:25 -0700 (PDT)), Lea Wiemann wrote:
> Jakub Narebski wrote:
> > Lea Wiemann wrote:

> > > +# Keep documentation in one place to save space.
> > 
> > Embedded PODs in Perl modules serve as sort of literate programming,
> > serving to describe code (technical/usage documentation) in addition
> > to comments in code.
> 
> Yeah, but this part is only a bunch of trivial accessor methods.
> If the module grows and the documentation needs to be split, it can be
> done later.  No need to be purist here. ;-)  Also ...
> 
> > [The fact that documentation is separated from code means that
> >  I cannot easily tell and write if code match documentation]
> 
> Several of the methods actually only exist in the Git::Object base
> class.  I still documented them in the Commit and Tag modules since
> having to look up methods in base class documentation can be a tad
> annoying, especially if the base class is never used by users of the
> API. 

I was wondering (but forgot to ask) why for some files you had
continuous block of documentation, and in others documentation 
interspersed with documented code.

I think that providing documentation for all methods for "front-end"
class, even for those that are implemented in [abstract] base class,
is a very good idea.  Better to have everything in one place, or at
least in "porcelain" documentation.

> > NOTE that element of list of revisions has in addition to that also
> > _effective_ parents in the event of history simplification, for
> > example  for 'history' view, or when using '--first-parent' extra
> > option. 
> 
> Yes, but we don't actually care about those effective parents for the
> purpose of the Git::Commit API.  IOW, the effective parent should be
> managed by the code that created a list of revisions, not by the
> Git::Commit API.

True.  Or to be more exact in a Git::Revlist (or somesuch) container
class.

> > > +Return the author string of this commit object. [...]
> > > +Return the committer string of this commit object.
> > 
> > It returns whole value of 'author' and 'committer' headers, not
> > something extracted from it (into name, email, epoch and timezone),
> > isn't it?
> 
> Yup; that's why I wrote "{author,committer} *string*". ;)

We would probably want _in the future_ to  return some object wrapper,
which stringifies to value of author and committer headers (to author
and committer string), but allows to extract (and format) parts of it,
for example

  $commit->author->name
  $commit->author->email
  $commit->author->date(format => RFC2822)

or perhaps

  $commit->author{'email'}

> > > +=item $commit->message
> > > +
> > > +Return the undecoded commit message of this commit object.
> > 
> > Just raw data?
> 
> Yes, just raw data.  Decoding is too tricky (i.e. not guaranteed to
> work) to just add a simple method to the API; IOW, it needs error
> handling and perhaps fallback encodings.

I'd rather then have _git_ convert it to UTF=8 for us (using 
--encoding=<encoding> option to git-log/git-rev-list), see below
on using git-log.

> > NOTE that for element of list of revisions (as returned by 
> > git-rev-list or git-log) would probably have commit message decoded
> > to UTF-8 by git.
> 
> Yes, but the API doesn't use any of those commands internally,
> if that's what you're worried about.

I'm not worried about; I think using git-log would be better (see also 
below)

> > > +sub author { [...]
> > > +	$self->{_AUTHOR()} or ''; }
> > 
> > Nowhere in documentation is mentioned that you use empty value
> > for no author or no committer (isn't commit object invalid then?).
> 
> Yes, I'd believe so.  I basically wanted to make sure that those 
> methods always return a string; do you think that this is a bad idea?

It don't know if it is good or bad idea, but you should have mentioned 
it in the documentation.

[HERE IS MAIN PART OF THIS RESPONSE]

> >> +	if (!defined $raw_text) {
> >> +		# Retrieve from the repository.
> >> +		(my $type, $raw_text) = $self->repo->cat_file($sha1);
> > 
> > The above makes Git::Commit good solution for gitweb's 'commit' and
> > 'commitdiff' views, but bad solution for 'log', 'shortlog', 
> > 'history' and 'rss'/'atom' views, where you would need to many 
> > command invocations, which is very bad on OS with slow fork.
> 
> $repo->cat_file (now renamed to get_object) actually doesn't fork but
> uses a pipe (cat-file --batch); I don't think it should be
> a performance issue.

It is (much) better than forking git-cat-file for each commit shown
on the list; nevertheless I think that it would be better to use git-log
to generate list (or Git::Revlist) of Git::Commit objects.  It is one
fork less, but what more important you don't have to access repository
twice for the very same objects.

Let me elaborate: if I understand correctly for log-like views you 
propose to first run simple git-rev-list with appropriate starting 
point and limiters (--skip, --max-count, -- <pathname>), perhaps using 
'--parents' option to get parents in simplified/rewritten history,
which would traverse history getting commit objects, but outputting
only fragments of info, then feed list of revisions (perhaps via cache,
i.e. excluding objects which are in cache) to 'git cat-file --batch'
open two-directional pipeline.

What I propose instead is to provide alternate method to fully 
instantiate Git::Commit object (in addition to ->_load), which would 
fill fields by parsing git-log / git-rev-list --headers output
(what gitweb currently does in parse_commits).

On the other hand... "git cat-file --batch" should have commits to be
accessed in filesystem cache, which means in memory; but it is possible
that they wouldn't be in cache because of I/O pressure (git-rev-list and 
git-cat-file are separate processes).  And checking if object is in
cache can be simpler... if less effective.  If you generate Git::Commit 
objects via parsing git-log / git-rev-list output, then you can limit
history further by excluding starting points from cache.

[END OF MAIN PART]

> > > +use 5.006002;
> > 
> > Why is this "use 5.006002" for?
> 
> It signifies that this module won't run with Perl <5.6.2.  I've had to
> bump it to 5.008 (Perl 5.8); more about that in the message announcing
> the next version of the patch series.

I was not asking what this mean, but why do you need to set up lower
bound on Perl version.  What feature pre 5.6.2 Perl lacks...

Requiring 5.8 is bad.  What feature pre 5.8 Perl lacks, that you
absolutely cannot go without it? There will be complaints.

> > > +=item 'git_binary'
> > > +The name or full path of the git binary (default: 'git').
> > 
> > Probably should be Git::Cmd or Git object, instead.
> 
> I don't think something Git::Cmd is a good idea (as I pointed out in 
> my reply to Petr, <487BD0F3.2060508@gmail.com>), or at least it
> shouldn't be implemented as part of this patch series.  This method
> is really just supposed to return an argument for exec*p, nothing
> more. 

I think that _not using_ Git::Cmd (or somesuch) API results in botched,
horrible API like (in the 3/3 patch in this series):

  our $git_version = $repo_root->repo(directory => 'dummy')->version;

Aaaaaargh! My eyes!

  out $git_version = Git->version;

(Unless it is not needed any longer, or not used any longer; if it is
so, then perhaps implementing Git::Cmd as generic wrapper around git
commands, hiding for example ActivePerl hack, could be left for later).

> > > +=item $repo->cmd_output(%opts)
> > 
> > Please do remember that there are git commands which do not need
> > access to git repository,
> 
> As I wrote in my reply to Petr, Git::Repo is not trying to be a 
> wrapper around git binaries, so this method really shouldn't be part
> of the official API -- it's just auxiliary; I'll underscore-prefix it.

Just a question: was this reply only to him, or to all?

I think that $repo->cmd_output(%opts) is a great shortcut for invoking
Git->cmd_output with '--git-dir=<repo>' added automatically.  So it
should be left, but perhaps under different implementation.

> > > +# To do: According to Git.pm, this might not work with ActiveState
> > > +# Perl on Win 32.  Need to check or wait for reports.
> > 
> > Why not copy code from Git.pm, then?
> 
> Apart from the fact that I don't do cargo-cult programming? ;-)   
> Git.pm forks, whereas Git::Repo uses open, '-|', so it's actually different
> (and it's not possible to copy the code).

Actually magic open, '-|' _does_ forks, only implicitely. So Git.pm
does generate the same or almost the same code, but it work (around)
with ActiveState Perl.

> > I think I'd rather allow extended SHA1 syntax in Git::Commit
> > and Git::Tag constructors; it is one call to git command less
> > (I think).
> 
> I wouldn't -- see my blurb about error handling at the top of my reply
> to Petr (<487BD0F3.2060508@gmail.com>).  You're not supposed to pass
> anything that you didn't get from get_sha1 into Git::Commit or
> Git::Tag constructors, or your error handling is invariably broken.

O.K.

I can understand this simpler, although less than optimal, and geared
mainly towards gitweb needs.

> > > +	my ($in, $out) = $self->get_bidi_pipe(
> > > +		cmd => ['cat-file','--batch-check'], reuse => 1);
> > 
> > Ahhh... here I can see what 'reuse => 1' means, and when it is 
> > useful. But doesn't it make sense _only_ for _bi-directional pipe_?
> > Are you sure that you wouldn't get deadlock?
> 
> Yes to both questions. :-)

Errr... "yes" to first question means that 'reuse' option makes sense
_only_ for get_bidi_pipe? If so, why it is present in other commands?

> > By the way, for gitweb you would need (for performance and for
> > rewritten parents) also get_log / get_commits / get_commits_list
> 
> No. ;-)  Doing fine without those.

See above.

> > > +=item $repo->name_rev($committish_sha1, $tags_only = 0)
> > 
> > Why name_rev, and no describe?
> 
> Feel free to add it. ;-)  (It might take some work to come up with a
> decent interface for that method.)

Why do you _need_ name_rev, if you are not to include git-describe
equivalent.

> > > +Return the tagger string of this tag object.
> > 
> > We would probably want some way to extract name, email, epoch/date
> > (and a way to convert epoch+timezone to RFC or ISO format), 
> > timezone.
> 
> Yeah.  At some point. ;-)

See above, in comment about Git::Commit.

> > Should (for completeness) Git::Tag provide $tag->validate() method?
> 
> No, since 'validate' sounds like it would have to do error handling.
> 
> If you mean that this should check if the object exists (and has the
> advertised type), the user of the API should test for "defined
> $tag->repo->get_sha1($tag->object)" or somesuch and do error handling
> themselves.

I meant here equivalent of "git tag -v <tag>"

-- 
Jakub Narebski
Poland

^ permalink raw reply

* Re: [RFC PATCH] index-pack: Issue a warning if deltaBaseCacheLimit is too small
From: Nicolas Pitre @ 2008-07-17 23:45 UTC (permalink / raw)
  To: Shawn O. Pearce
  Cc: Stephan Hennig, Andreas Ericsson, Johannes Schindelin,
	Jakub Narebski, Junio C Hamano, git
In-Reply-To: <20080717220251.GA3072@spearce.org>

On Thu, 17 Jul 2008, Shawn O. Pearce wrote:

> Its rare that we should exceed deltaBaseCacheLimit while resolving
> delta compressed objects.  By default this limit is 16M, and most
> chains are under 50 objects in length.  This affords about 327K per
> object in the chain, which is quite large by source code standards.
> 
> If we have to recreate a prior delta base because we evicted it to
> stay within the deltaBaseCacheLimit we can warn the user that their
> configured limit is perhaps too low for this repository data set.
> If the user keeps seeing the warning they can research it in the
> documentation, and consider setting it higher on this repository,
> or just globally on their system.

As I said earlier, I don't think this is a good idea, but I'll elaborate 
a bit more.

First, this is a really bad clue for setting deltaBaseCacheLimit.  The 
likelyhood of this warning to actually show up during an initial clone 
is relatively high, yet this doesn't mean that deltaBaseCacheLimit has 
to be changed at all.  For one, the real time usage of 
deltaBaseCacheLimit is to cap a cache of objects for multiple delta 
chains with random access, and not only one chain traversed linearly 
like in the index-pack case, 
and that cache is 
likely to always be full and in active eviction mode -- that's the point 
of a cap after all.  In the index-pack this is only used to avoid 
excessive memory usage for intermediate delta results and not really a 
cache.  In other words, we have two rather different usages for the same 
settings.  Now don't read me wrong: I think that reusing this setting is 
sensible, but its value should not be determined by what index-pack may 
happen to do with it, especially on a first clone.  And issuing warnings 
on the first clone is not the way to give new users confidence either.

Secondly, on subsequent fetches, the warning is likely to never appear 
again due to the fact that the delta chains will typically be much 
shorter.  And that would be true even if in reality the runtime access 
to the repository would benefit a lot from deltaBaseCacheLimit being 
raised.  And it is the runtime access which is important here, not the 
occasional fetch.  Yet the full delta chains are not likely to be walked 
in their entirety very often anyway either.

Thirdly, if such indication is considered useful, then it should really 
be part of some statistic/analysis tool, such as verify-pack for 
example.  Such a tool could compute the exact memory requirements for a 
given repository usage and possibly provide suggestions as to what the 
optimal deltaBaseCacheLimit value could be.  But yet that cache has a 
hardcoded number of entries at the moment and its hash function might 
not be optimal either, making the connection with index-pack even more 
apart.

And finally, I think that index-pack would benefit a lot from a really 
simple optimization which is to free the resulting intermediate delta 
base object right away when there is only one delta child to resolve, 
before that child is itself used as a base for further delta 
grand-children.  That is likely to cover most cases of big delta chains 
already, making that warning an even worse indicator.

> Suggested-by: Stephan Hennig <mailing_list@arcor.de>
> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>

Unrecommended-by: Nicolas Pitre <nico@cam.org>

Nicolas

^ permalink raw reply

* Re: Considering teaching plumbing to users harmful
From: Robin Rosenberg @ 2008-07-17 22:32 UTC (permalink / raw)
  To: David Kastrup; +Cc: Kevin Ballard, git
In-Reply-To: <85wsjkgr7a.fsf@lola.goethe.zz>

torsdagen den 17 juli 2008 23.02.01 skrev David Kastrup:
> Kevin Ballard <kevin@sb.org> writes:
> 
> > On Jul 17, 2008, at 9:05 AM, David Kastrup wrote:
[...]
> > issue a command to copy by URL, then issue an `svn switch` command,
> > and then I have to remember that I have a switched repository.
> 
> Huh?  How is that different to remembering a switched branch?  Anyway,
That's what we have the custom prompt in contrib for. Like this.
[me@mymachine EGIT (rr/gitselectionprovider)]$

Obviously it's not mandatory, but I recommend it as it also show stuff like
rebase, am, merge and bisect status too.

Maybe one you do that for SVN too. I have one for CVS.

> one tends to check out different branches in different workdirs.
This is very different for individuals. 

-- robin

^ permalink raw reply

* [PATCH] git mv: Support moving submodules
From: Petr Baudis @ 2008-07-17 22:34 UTC (permalink / raw)
  To: git; +Cc: gitster
In-Reply-To: <20080717223036.20680.9672.stgit@localhost>

This patch adds support for moving submodules to 'git mv', including
rewriting of the .gitmodules file to reflect the movement.

Signed-off-by: Petr Baudis <pasky@suse.cz>
---

  This is the updated version reflecting the recent rewrite. No other
  changes in my queue yet (except expanding the git rm explanation).

 builtin-mv.c |   37 +++++++++++++++++++++++++++++++++----
 1 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/builtin-mv.c b/builtin-mv.c
index 28ebc9c..62d0c95 100644
--- a/builtin-mv.c
+++ b/builtin-mv.c
@@ -9,6 +9,7 @@
 #include "cache-tree.h"
 #include "path-list.h"
 #include "parse-options.h"
+#include "submodule.h"
 
 static const char * const builtin_mv_usage[] = {
 	"git mv [options] <source>... <destination>",
@@ -49,6 +50,24 @@ static const char *add_slash(const char *path)
 	return path;
 }
 
+static int ce_is_gitlink(int i)
+{
+	return i < 0 ? 0 : S_ISGITLINK(active_cache[i]->ce_mode);
+}
+
+static void rename_submodule(struct path_list_item *i)
+{
+	char *key = submodule_by_path(i->path);
+
+	config_exclusive_filename = ".gitmodules";
+	if (git_config_set(key, (const char *) i->util))
+		die("cannot update .gitmodules");
+	config_exclusive_filename = NULL;
+
+	free(key);
+}
+
+
 static struct lock_file lock_file;
 
 int cmd_mv(int argc, const char **argv, const char *prefix)
@@ -65,6 +84,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 	enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX } *modes;
 	struct stat st;
 	struct path_list src_for_dst = {NULL, 0, 0, 0};
+	/* .path is source path, .util is destination path */
+	struct path_list submodules = {NULL, 0, 0, 0};
 
 	git_config(git_default_config, NULL);
 
@@ -84,7 +105,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 		/* special case: "." was normalized to "" */
 		destination = copy_pathspec(dest_path[0], argv, argc, 1);
 	else if (!lstat(dest_path[0], &st) &&
-			S_ISDIR(st.st_mode)) {
+			S_ISDIR(st.st_mode) &&
+			!ce_is_gitlink(cache_name_pos(dest_path[0], strlen(dest_path[0])))) {
 		dest_path[0] = add_slash(dest_path[0]);
 		destination = copy_pathspec(dest_path[0], argv, argc, 1);
 	} else {
@@ -96,7 +118,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 	/* Checking */
 	for (i = 0; i < argc; i++) {
 		const char *src = source[i], *dst = destination[i];
-		int length, src_is_dir;
+		int length, src_is_dir, src_cache_pos;
 		const char *bad = NULL;
 
 		if (show_only)
@@ -111,7 +133,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 		} else if ((src_is_dir = S_ISDIR(st.st_mode))
 				&& lstat(dst, &st) == 0)
 			bad = "cannot move directory over file";
-		else if (src_is_dir) {
+		else if ((src_cache_pos = cache_name_pos(src, length)) < 0 && src_is_dir) {
 			const char *src_w_slash = add_slash(src);
 			int len_w_slash = length + 1;
 			int first, last;
@@ -177,7 +199,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 				} else
 					bad = "Cannot overwrite";
 			}
-		} else if (cache_name_pos(src, length) < 0)
+		} else if (src_cache_pos < 0)
 			bad = "not under version control";
 		else if (path_list_has_path(&src_for_dst, dst))
 			bad = "multiple sources for the same target";
@@ -214,10 +236,17 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 
 		pos = cache_name_pos(src, strlen(src));
 		assert(pos >= 0);
+		if (ce_is_gitlink(pos))
+			path_list_insert(src, &submodules)->util = (void *) dst;
 		if (!show_only)
 			rename_cache_entry_at(pos, dst);
 	}
 
+	for (i = 0; i < submodules.nr; i++)
+		rename_submodule(&submodules.items[i]);
+	if (submodules.nr > 0 && add_file_to_cache(".gitmodules", 0))
+		die("cannot add new .gitmodules to the index");
+
 	if (active_cache_changed) {
 		if (write_cache(newfd, active_cache, active_nr) ||
 		    commit_locked_index(&lock_file))

^ permalink raw reply related

* [PATCH] git-mv: Keep moved index entries inact
From: Petr Baudis @ 2008-07-17 22:31 UTC (permalink / raw)
  To: git; +Cc: gitster
In-Reply-To: <20080717130651.GU32184@machine.or.cz>

The rewrite of git-mv from a shell script to a builtin was perhaps
a little too straightforward: the git add and git rm queues were
emulated directly, which resulted in a rather complicated code and
caused an inconsistent behaviour when moving dirty index entries;
git mv would update the entry based on working tree state,
except in case of overwrites, where the new entry would still have
sha1 of the old file.

This patch introduces rename_index_entry_at() into the index toolkit,
which will rename an entry while removing any entries the new entry
might render duplicate. This is then used in git mv instead
of all the file queues, resulting in a major simplification
of the code and an inevitable change in git mv -n output format.

A new test has been added to the testsuite to reflect this change.
Also, based on suggestion by Junio about desired symlink behaviour
of git mv, I have added two tests for that; however, I do not have
need or desire to spend time fixing this, so they are expected
to fail for now until someone gets around to fixing that.

Signed-off-by: Petr Baudis <pasky@suse.cz>
---

  This patch might depend on git-mv: Remove dead code branch.

 builtin-mv.c  |   62 ++++++++-------------------------------------------------
 cache.h       |    2 ++
 read-cache.c  |   15 ++++++++++++++
 t/t7001-mv.sh |   57 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 83 insertions(+), 53 deletions(-)

diff --git a/builtin-mv.c b/builtin-mv.c
index 33ad082..28ebc9c 100644
--- a/builtin-mv.c
+++ b/builtin-mv.c
@@ -36,17 +36,6 @@ static const char **copy_pathspec(const char *prefix, const char **pathspec,
 	return get_pathspec(prefix, result);
 }
 
-static void show_list(const char *label, struct path_list *list)
-{
-	if (list->nr > 0) {
-		int i;
-		printf("%s", label);
-		for (i = 0; i < list->nr; i++)
-			printf("%s%s", i > 0 ? ", " : "", list->items[i].path);
-		putchar('\n');
-	}
-}
-
 static const char *add_slash(const char *path)
 {
 	int len = strlen(path);
@@ -75,11 +64,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 	const char **source, **destination, **dest_path;
 	enum update_mode { BOTH = 0, WORKING_DIRECTORY, INDEX } *modes;
 	struct stat st;
-	struct path_list overwritten = {NULL, 0, 0, 0};
 	struct path_list src_for_dst = {NULL, 0, 0, 0};
-	struct path_list added = {NULL, 0, 0, 0};
-	struct path_list deleted = {NULL, 0, 0, 0};
-	struct path_list changed = {NULL, 0, 0, 0};
 
 	git_config(git_default_config, NULL);
 
@@ -189,7 +174,6 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 							" will overwrite!\n",
 							bad);
 					bad = NULL;
-					path_list_insert(dst, &overwritten);
 				} else
 					bad = "Cannot overwrite";
 			}
@@ -218,6 +202,7 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 	for (i = 0; i < argc; i++) {
 		const char *src = source[i], *dst = destination[i];
 		enum update_mode mode = modes[i];
+		int pos;
 		if (show_only || verbose)
 			printf("Renaming %s to %s\n", src, dst);
 		if (!show_only && mode != INDEX &&
@@ -227,45 +212,16 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 		if (mode == WORKING_DIRECTORY)
 			continue;
 
-		assert(cache_name_pos(src, strlen(src)) >= 0);
-
-		path_list_insert(src, &deleted);
-		/* destination can be a directory with 1 file inside */
-		if (path_list_has_path(&overwritten, dst))
-			path_list_insert(dst, &changed);
-		else
-			path_list_insert(dst, &added);
+		pos = cache_name_pos(src, strlen(src));
+		assert(pos >= 0);
+		if (!show_only)
+			rename_cache_entry_at(pos, dst);
 	}
 
-	if (show_only) {
-		show_list("Changed  : ", &changed);
-		show_list("Adding   : ", &added);
-		show_list("Deleting : ", &deleted);
-	} else {
-		for (i = 0; i < changed.nr; i++) {
-			const char *path = changed.items[i].path;
-			int j = cache_name_pos(path, strlen(path));
-			struct cache_entry *ce = active_cache[j];
-
-			if (j < 0)
-				die ("Huh? Cache entry for %s unknown?", path);
-			refresh_cache_entry(ce, 0);
-		}
-
-		for (i = 0; i < added.nr; i++) {
-			const char *path = added.items[i].path;
-			if (add_file_to_cache(path, verbose ? ADD_CACHE_VERBOSE : 0))
-				die("updating index entries failed");
-		}
-
-		for (i = 0; i < deleted.nr; i++)
-			remove_file_from_cache(deleted.items[i].path);
-
-		if (active_cache_changed) {
-			if (write_cache(newfd, active_cache, active_nr) ||
-			    commit_locked_index(&lock_file))
-				die("Unable to write new index file");
-		}
+	if (active_cache_changed) {
+		if (write_cache(newfd, active_cache, active_nr) ||
+		    commit_locked_index(&lock_file))
+			die("Unable to write new index file");
 	}
 
 	return 0;
diff --git a/cache.h b/cache.h
index a779d92..6f1d003 100644
--- a/cache.h
+++ b/cache.h
@@ -260,6 +260,7 @@ static inline void remove_name_hash(struct cache_entry *ce)
 #define unmerged_cache() unmerged_index(&the_index)
 #define cache_name_pos(name, namelen) index_name_pos(&the_index,(name),(namelen))
 #define add_cache_entry(ce, option) add_index_entry(&the_index, (ce), (option))
+#define rename_cache_entry_at(pos, new_name) rename_index_entry_at(&the_index, (pos), (new_name))
 #define remove_cache_entry_at(pos) remove_index_entry_at(&the_index, (pos))
 #define remove_file_from_cache(path) remove_file_from_index(&the_index, (path))
 #define add_to_cache(path, st, flags) add_to_index(&the_index, (path), (st), (flags))
@@ -370,6 +371,7 @@ extern int index_name_pos(const struct index_state *, const char *name, int name
 #define ADD_CACHE_JUST_APPEND 8		/* Append only; tree.c::read_tree() */
 extern int add_index_entry(struct index_state *, struct cache_entry *ce, int option);
 extern struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int really);
+extern void rename_index_entry_at(struct index_state *, int pos, const char *new_name);
 extern int remove_index_entry_at(struct index_state *, int pos);
 extern int remove_file_from_index(struct index_state *, const char *path);
 #define ADD_CACHE_VERBOSE 1
diff --git a/read-cache.c b/read-cache.c
index 1648428..70e5f57 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -38,6 +38,21 @@ static void replace_index_entry(struct index_state *istate, int nr, struct cache
 	istate->cache_changed = 1;
 }
 
+void rename_index_entry_at(struct index_state *istate, int nr, const char *new_name)
+{
+	struct cache_entry *old = istate->cache[nr], *new;
+	int namelen = strlen(new_name);
+
+	new = xmalloc(cache_entry_size(namelen));
+	copy_cache_entry(new, old);
+	new->ce_flags = (new->ce_flags & ~CE_NAMEMASK) | namelen;
+	memcpy(new->name, new_name, namelen);
+
+	cache_tree_invalidate_path(istate->cache_tree, old->name);
+	remove_index_entry_at(istate, nr);
+	add_index_entry(istate, new, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE);
+}
+
 /*
  * This only updates the "non-critical" parts of the directory
  * cache, ie the parts that aren't tracked by GIT, and only used
diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
index 336cfaa..6b615f8 100755
--- a/t/t7001-mv.sh
+++ b/t/t7001-mv.sh
@@ -156,4 +156,61 @@ test_expect_success 'absolute pathname outside should fail' '(
 
 )'
 
+# git mv meets angry Git maintainer
+
+test_expect_success 'git mv should not change sha1 of moved cache entry' '
+
+	rm -fr .git &&
+	git init &&
+	echo 1 >dirty &&
+	git add dirty &&
+	entry="$(git ls-files --stage dirty | cut -f 1)"
+	git mv dirty dirty2 &&
+	[ "$entry" = "$(git ls-files --stage dirty2 | cut -f 1)" ] &&
+	echo 2 >dirty2 &&
+	git mv dirty2 dirty &&
+	[ "$entry" = "$(git ls-files --stage dirty | cut -f 1)" ]
+
+'
+
+rm -f dirty dirty2
+
+test_expect_failure 'git mv should overwrite symlink to a file' '
+
+	rm -fr .git &&
+	git init &&
+	echo 1 >moved &&
+	ln -s moved symlink &&
+	git add moved symlink &&
+	! git mv moved symlink &&
+	git mv -f moved symlink &&
+	[ ! -e moved ] &&
+	[ -f symlink ] &&
+	[ $(cat symlink) = 1 ] &&
+	git diff-files --quiet
+
+'
+
+rm -f moved symlink
+
+test_expect_failure 'git mv should follow symlink to a directory' '
+
+	rm -fr .git &&
+	git init &&
+	echo 1 >moved &&
+	mkdir -p dir &&
+	touch dir/.keep &&
+	ln -s dir symlink &&
+	git add moved dir/.keep symlink &&
+	git mv moved symlink &&
+	[ ! -e moved ] &&
+	[ -f symlink/moved ] &&
+	[ $(cat symlink/moved) = 1 ] &&
+	[ "$(ls dir)" = "$(ls symlink)" ] &&
+	git diff-files --quiet
+
+'
+
+rm -rf moved symlink dir
+
 test_done

^ permalink raw reply related

* Re: [JGIT PATCH 1/1] jgit: create a tag command
From: Robin Rosenberg @ 2008-07-17 22:24 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Mike Ralphson, Marek Zawirski, git
In-Reply-To: <20080717214520.GB2798@spearce.org>

torsdagen den 17 juli 2008 23.45.20 skrev Shawn O. Pearce:
> Looking at the code further its fairly simple.  I don't think it
> would be difficult for us to tweak what we need, and try to push
> patches upstream.  Well worthwhile actually.  The parser made our pgm
> package shorter, and we didn't have to waste time writing our own.
> Its well built, and was (mostly) easily extended to handle fully
> transparent String->RevCommit or String->RevTree parsing.  So its
> worth the few minutes to improve it further.

Ok, then. Go ahead.

-- robin

^ permalink raw reply

* Re: Subversion's do-everything-via-copying paradigm ( was RE: Re: Considering teaching plumbing to users harmful)
From: Junio C Hamano @ 2008-07-17 22:11 UTC (permalink / raw)
  To: Avery Pennarun; +Cc: Craig L. Ching, David Kastrup, git
In-Reply-To: <32541b130807171507j98f883by2e937e894838852@mail.gmail.com>

Could you please take this elsewhere?  Now it is about subversion and not
about git anymore...

^ permalink raw reply

* Re: Subversion's do-everything-via-copying paradigm ( was RE: Re: Considering teaching plumbing to users harmful)
From: Avery Pennarun @ 2008-07-17 22:07 UTC (permalink / raw)
  To: Craig L. Ching; +Cc: David Kastrup, git
In-Reply-To: <63BEA5E623E09F4D92233FB12A9F79430238A16B@emailmn.mqsoftware.com>

On 7/17/08, Craig L. Ching <cching@mqsoftware.com> wrote:
>  Does it not bother you that renaming a file is a copy + delete [1].
>  Have they fixed that yet?  That was one of the biggest reasons we never
>  moved to subversion.

Perhaps you've confused CVS with subversion.

svn's handilng of renames is not as graceful as git's, but it does
work in the common cases.  For example, after the aforementioned
copy+delete, "svn blame" and "svn log" will still follow history back
through the rename.

Not sure how I ended up defending subversion on this list.  Somehow
I've been tricked into being on the losing side of this argument :)  I
guess I just wanted to balance the viewpoints a little; there are good
things to be learned from svn, but not things about branching,
merging, or rename tracking.

Have fun,

Avery

^ permalink raw reply

* RE: Subversion's do-everything-via-copying paradigm ( was RE:  Re: Considering teaching plumbing to users harmful)
From: Craig L. Ching @ 2008-07-17 22:06 UTC (permalink / raw)
  To: David Kastrup; +Cc: git
In-Reply-To: <85sku8gr1q.fsf@lola.goethe.zz>

 

> -----Original Message-----
> From: David Kastrup [mailto:dak@gnu.org] 
> Sent: Thursday, July 17, 2008 4:05 PM
> To: Craig L. Ching
> Cc: git@vger.kernel.org
> Subject: Re: Subversion's do-everything-via-copying paradigm 
> ( was RE: Re: Considering teaching plumbing to users harmful)
> 
> "Craig L. Ching" <cching@mqsoftware.com> writes:
> 
> > Does it not bother you that renaming a file is a copy + delete [1].
> > Have they fixed that yet?  That was one of the biggest 
> reasons we never
> > moved to subversion.
> 
> Never bit me.
> 
Ignorance is bliss as they say ;-)

> -- 
> David Kastrup, Kriemhildstr. 15, 44793 Bochum
> 

Cheers,
Craig

^ permalink raw reply

* [RFC PATCH] index-pack: Issue a warning if deltaBaseCacheLimit is too small
From: Shawn O. Pearce @ 2008-07-17 22:02 UTC (permalink / raw)
  To: Nicolas Pitre, Stephan Hennig
  Cc: Andreas Ericsson, Johannes Schindelin, Jakub Narebski,
	Junio C Hamano, git
In-Reply-To: <20080717213550.GA2798@spearce.org>

Its rare that we should exceed deltaBaseCacheLimit while resolving
delta compressed objects.  By default this limit is 16M, and most
chains are under 50 objects in length.  This affords about 327K per
object in the chain, which is quite large by source code standards.

If we have to recreate a prior delta base because we evicted it to
stay within the deltaBaseCacheLimit we can warn the user that their
configured limit is perhaps too low for this repository data set.
If the user keeps seeing the warning they can research it in the
documentation, and consider setting it higher on this repository,
or just globally on their system.

Suggested-by: Stephan Hennig <mailing_list@arcor.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 index-pack.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/index-pack.c b/index-pack.c
index ac20a46..97533d6 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -53,6 +53,7 @@ static struct object_entry *objects;
 static struct delta_entry *deltas;
 static struct base_data *base_cache;
 static size_t base_cache_used;
+static int oom_warning;
 static int nr_objects;
 static int nr_deltas;
 static int nr_resolved_deltas;
@@ -481,6 +482,13 @@ static void *get_base_data(struct base_data *c)
 	if (!c->data) {
 		struct object_entry *obj = c->obj;

+		if (!oom_warning && verbose) {
+			if (progress)
+				fputc('\n', stderr);
+			warning("One or more delta chains are larger than deltaBaseCache.");
+			oom_warning = 1;
+		}
+
 		if (obj->type == OBJ_REF_DELTA || obj->type == OBJ_OFS_DELTA) {
 			void *base = get_base_data(c->base);
 			void *raw = get_data_from_pack(obj);
-- 
1.5.6.3.569.ga9185

^ permalink raw reply related

* Re: Considering teaching plumbing to users harmful
From: David Kastrup @ 2008-07-17 21:02 UTC (permalink / raw)
  To: Kevin Ballard; +Cc: git
In-Reply-To: <6B9BBA72-6E75-47E3-911A-4A5309090807@sb.org>

Kevin Ballard <kevin@sb.org> writes:

> On Jul 17, 2008, at 9:05 AM, David Kastrup wrote:
>
>> How much have you worked with Subversion so far?  I am doing quite a
>> bit of work with it, and the do-everything-via-copying paradigm does
>> not get in my hair.  It actually means that I have to remember fewer
>> commands.  And it is pretty easy to understand.
>
> Sure, it's simpler, but the overhead in creating and using a branch is
> much larger. I have to extract the URL from the repository (since
> naturally I only have trunk checked out),

Say something like svn info and then use cut&paste.

> issue a command to copy by URL, then issue an `svn switch` command,
> and then I have to remember that I have a switched repository.

Huh?  How is that different to remembering a switched branch?  Anyway,
one tends to check out different branches in different workdirs.

> Switching between branches is a pain, especially if you have
> uncommitted work. There's a reason I never bothered to use branches
> when I used subversion.

Looks like it.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply

* Re: Subversion's do-everything-via-copying paradigm ( was RE:  Re: Considering teaching plumbing to users harmful)
From: David Kastrup @ 2008-07-17 21:05 UTC (permalink / raw)
  To: Craig L. Ching; +Cc: git
In-Reply-To: <63BEA5E623E09F4D92233FB12A9F79430238A16B@emailmn.mqsoftware.com>

"Craig L. Ching" <cching@mqsoftware.com> writes:

>> -----Original Message-----
>> From: git-owner@vger.kernel.org 
>> [mailto:git-owner@vger.kernel.org] On Behalf Of David Kastrup
>> Sent: Thursday, July 17, 2008 11:05 AM
>> To: git@vger.kernel.org
>> Subject: Re: Considering teaching plumbing to users harmful
>> 
>> How much have you worked with Subversion so far?  I am doing 
>> quite a bit of work with it, and the 
>> do-everything-via-copying paradigm does not get in my hair.  
>> It actually means that I have to remember fewer commands.
>> And it is pretty easy to understand.
>> 
>
> Does it not bother you that renaming a file is a copy + delete [1].
> Have they fixed that yet?  That was one of the biggest reasons we never
> moved to subversion.

Never bit me.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply

* Re: [JGIT PATCH 1/1] jgit: create a tag command
From: Shawn O. Pearce @ 2008-07-17 21:45 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: Mike Ralphson, Marek Zawirski, git
In-Reply-To: <200807172331.18059.robin.rosenberg.lists@dewire.com>

Robin Rosenberg <robin.rosenberg.lists@dewire.com> wrote:
> lördagen den 12 juli 2008 05.42.06 skrev Shawn O. Pearce:
> > Probably the state-of-the-arg is args4j:
> > 
> >   https://args4j.dev.java.net/
> > 
> > It uses Java 5 annotations to setup the argument parsing:
> 
> But it doesn't use GNU getOpt's long/short style parsing which is a BIG loss. Maybe
> we could contribute it? Or at least not design options incompatible with such an option
> parser.

I'm already using long style options, "--git-dir", "$path" works,
and I taught a subclass wrapper to split "--git-dir=$path" into
the two argument format so we can handle the GNU style long options.

Short style options are harder.  You can alias "-h" to "--help"
and have both respond, but you cannot get option clusters like
"-ar" to expand to "-a","-r" and then parse.  I could have the same
wrapper handle splitting "-ar" into "-a","-r" but it cannot handle
"-C80r" as "-C80","-r" as it doesn't have any information about
what options take values, and which don't.

Getting "--" to terminate the end of options is supported, but
making it handle `git log ref -- path` was interesting.  I had to
make "--" actually an option name, and have it eat all values after
"--" as path specs to implement git-log style arguments.

Looking at the code further its fairly simple.  I don't think it
would be difficult for us to tweak what we need, and try to push
patches upstream.  Well worthwhile actually.  The parser made our pgm
package shorter, and we didn't have to waste time writing our own.
Its well built, and was (mostly) easily extended to handle fully
transparent String->RevCommit or String->RevTree parsing.  So its
worth the few minutes to improve it further.

-- 
Shawn.

^ permalink raw reply

* Re: [JGIT PATCH 1/1] jgit: create a tag command
From: Robin Rosenberg @ 2008-07-17 21:31 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Mike Ralphson, Marek Zawirski, git
In-Reply-To: <20080712034206.GA16101@spearce.org>

lördagen den 12 juli 2008 05.42.06 skrev Shawn O. Pearce:
> "Shawn O. Pearce" <spearce@spearce.org> wrote:
> > Mike Ralphson <mike.ralphson@gmail.com> wrote:
> > 
> > > > I think we are at the point where we need to either write a
> > > > #@!*(!@(! command line option parser, import one, or stop writing
> > > > command line programs.  I would certainly appreciate any opinion
> > > > you might have on the matter.
> > > 
> > > a) is a distraction, c) is a backwards step, so maybe b) wins.
> > 
> > So I looked at GNU getopt and its at least smaller than Apache
> > Commons,
> 
> Probably the state-of-the-arg is args4j:
> 
>   https://args4j.dev.java.net/
> 
> It uses Java 5 annotations to setup the argument parsing:

But it doesn't use GNU getOpt's long/short style parsing which is a BIG loss. Maybe
we could contribute it? Or at least not design options incompatible with such an option
parser.

-- robin

^ permalink raw reply

* Re: [PATCH 0/4] Honor core.deltaBaseCacheLimit during index-pack
From: Shawn O. Pearce @ 2008-07-17 21:35 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Stephan Hennig, Andreas Ericsson, Johannes Schindelin,
	Jakub Narebski, Junio C Hamano, git
In-Reply-To: <alpine.LFD.1.10.0807171215020.3110@xanadu.home>

Nicolas Pitre <nico@cam.org> wrote:
> On Thu, 17 Jul 2008, Stephan Hennig wrote:
> > Even though Git now makes some efforts to substitute runtime
> > for memory to be able to operate with low(er) memory, I think it would
> > still be informative for a user that repository and hardware, resp.
> > core.deltaBaseCacheLimit, are, say, incompatible.  If valuable objects
> > have to be discarded due to memory restrictions a warning could be
> > issued to make the user aware of this fact, e.g.,
> > 
> >   Warning! Low memory. Git might be slowing down.
> 
> Well, I disagree.  First we don't know how slow git would effectively be 
> since all (my) concerns so far were totally theoretical.  It will still 
> work better than, say, 'git verify-pack' nevertheless. And git should 
> just do its best regardless and avoid being needlessly verbose.

Actually, this warning may be a good idea.  I'll post an RFC patch
for it in a few minutes.  If people hate the idea, that's what an
RFC is for.  :)

-- 
Shawn.

^ permalink raw reply

* Re: Considering teaching plumbing to users harmful
From: Kevin Ballard @ 2008-07-17 21:10 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Petr Baudis, David Kastrup, git
In-Reply-To: <200807172303.19339.jnareb@gmail.com>

On Jul 17, 2008, at 2:03 PM, Jakub Narebski wrote:

> Dnia czwartek 17. lipca 2008 22:40, Kevin Ballard napisał:
>> On Jul 17, 2008, at 1:26 PM, Petr Baudis wrote:
>>> On Thu, Jul 17, 2008 at 01:12:57PM -0700, Kevin Ballard wrote:
>
>>>> There is one facet of submodules that annoys me, because it
>>>> prevents me from using them as a replacement for svn:externals.
>>>> Namely, the submodule refers to a specific repository, but not
>>>> a path within that repository.  I work with svn repos that use
>>>> svn:externals to peg revisions (as is appropriate) but they all
>>>> refer to various paths within the other repositories, and the
>>>> only way I can deal with that is to throw symlinks everywhere.
>>>
>>> Actually, is this a big problem? Git can track symlinks and without
>>> adding support for overall partial checkouts, adding this would feel
>>> like too huge a hack to me.
>>>
>>> Also, when converting to a different VCS, it might be sensible to
>>> adjust
>>> your modules setup a bit as well - the requirement to include only
>>> particular subdirectory of a submodule sounds rather strange to me.
>>
>> The problem is right now I maintain a bunch of git-svn mirrors of
>> internal svn repos, but the company isn't willing to switch to git.
>> And we use subtree externals links to do things like pull in the
>> models from one rails app into another, or pull in various
>> subdirectories of the "support" repository.
>
> I think the correct solution would be to make 'models' separate
> repository... or create interim repository containing only changes
> to 'models', and having 'models' as its top directory.

That would require significantly more work to deal with than using  
symlinks like I described, since the company is not willing to adjust  
anything to help with my git usage (as I'm the only git user here).

-Kevin Ballard

-- 
Kevin Ballard
http://kevin.sb.org
kevin@sb.org
http://www.tildesoft.com

^ permalink raw reply

* Re: Considering teaching plumbing to users harmful
From: Jakub Narebski @ 2008-07-17 21:03 UTC (permalink / raw)
  To: Kevin Ballard; +Cc: Petr Baudis, David Kastrup, git
In-Reply-To: <57BAA376-10A4-4E3F-BB8E-37B46E8C49D3@sb.org>

Dnia czwartek 17. lipca 2008 22:40, Kevin Ballard napisał:
> On Jul 17, 2008, at 1:26 PM, Petr Baudis wrote:
>> On Thu, Jul 17, 2008 at 01:12:57PM -0700, Kevin Ballard wrote:

>>> There is one facet of submodules that annoys me, because it  
>>> prevents me from using them as a replacement for svn:externals.
>>> Namely, the submodule refers to a specific repository, but not
>>> a path within that repository.  I work with svn repos that use
>>> svn:externals to peg revisions (as is appropriate) but they all
>>> refer to various paths within the other repositories, and the
>>> only way I can deal with that is to throw symlinks everywhere.
>>
>> Actually, is this a big problem? Git can track symlinks and without
>> adding support for overall partial checkouts, adding this would feel
>> like too huge a hack to me.
>>
>> Also, when converting to a different VCS, it might be sensible to  
>> adjust
>> your modules setup a bit as well - the requirement to include only
>> particular subdirectory of a submodule sounds rather strange to me.
> 
> The problem is right now I maintain a bunch of git-svn mirrors of  
> internal svn repos, but the company isn't willing to switch to git.  
> And we use subtree externals links to do things like pull in the  
> models from one rails app into another, or pull in various  
> subdirectories of the "support" repository.

I think the correct solution would be to make 'models' separate 
repository... or create interim repository containing only changes
to 'models', and having 'models' as its top directory.

-- 
Jakub Narebski
Poland

^ permalink raw reply

* [StGit PATCH] Fix uncommit status message
From: Karl Hasselström @ 2008-07-17 20:43 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: git

It should say

  Uncommitting to 8561b089afbaed2651591e5a4574fdca451d82f2 (exclusive) ...

not

  Uncommitting to Commit<sha1: 8561b089afbaed2651591e5a4574fdca451d82f2, data: None> (exclusive) ...

(though arguably, the sha1 should be abbreviated as well).

Signed-off-by: Karl Hasselström <kha@treskal.com>

---

 stgit/commands/uncommit.py |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)


diff --git a/stgit/commands/uncommit.py b/stgit/commands/uncommit.py
index eb39fcc..0f43b49 100644
--- a/stgit/commands/uncommit.py
+++ b/stgit/commands/uncommit.py
@@ -104,9 +104,9 @@ def func(parser, options, args):
             next_commit = get_parent(next_commit)
     else:
         if options.exclusive:
-            out.start('Uncommitting to %s (exclusive)' % to_commit)
+            out.start('Uncommitting to %s (exclusive)' % to_commit.sha1)
         else:
-            out.start('Uncommitting to %s' % to_commit)
+            out.start('Uncommitting to %s' % to_commit.sha1)
         while True:
             if next_commit == to_commit:
                 if not options.exclusive:

^ permalink raw reply related

* [StGit PATCH 5/5] Global performance logging
From: Karl Hasselström @ 2008-07-17 20:42 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: git
In-Reply-To: <20080717204133.23407.34264.stgit@yoghurt>

Measure the time for the whole program, and how much of that was
subprocess calls.

Signed-off-by: Karl Hasselström <kha@treskal.com>

---

 stgit/main.py |   11 +++++++++--
 stgit/run.py  |   32 ++++++++++++++++++++++++++++----
 2 files changed, 37 insertions(+), 6 deletions(-)


diff --git a/stgit/main.py b/stgit/main.py
index aa1f8ef..64cff30 100644
--- a/stgit/main.py
+++ b/stgit/main.py
@@ -23,7 +23,7 @@ from optparse import OptionParser
 
 import stgit.commands
 from stgit.out import *
-from stgit import utils
+from stgit import run, utils
 
 #
 # The commands map
@@ -192,7 +192,7 @@ def print_help():
 #
 # The main function (command dispatcher)
 #
-def main():
+def _main():
     """The main function
     """
     global prog
@@ -293,3 +293,10 @@ def main():
         sys.exit(utils.STGIT_BUG_ERROR)
 
     sys.exit(ret or utils.STGIT_SUCCESS)
+
+def main():
+    run.start_logging()
+    try:
+        _main()
+    finally:
+        run.stop_logging()
diff --git a/stgit/run.py b/stgit/run.py
index befd3c1..e46836b 100644
--- a/stgit/run.py
+++ b/stgit/run.py
@@ -42,7 +42,27 @@ def get_log_mode(spec):
         f = out
     return (log_mode, f)
 
-(_log_mode, _logfile) = get_log_mode(os.environ.get('STGIT_SUBPROCESS_LOG', ''))
+def start_logging():
+    global _log_mode, _logfile, _log_starttime, _log_subproctime
+    (_log_mode, _logfile) = get_log_mode(
+        os.environ.get('STGIT_SUBPROCESS_LOG', ''))
+    _log_starttime = datetime.datetime.now()
+    _log_subproctime = 0.0
+
+def duration(t1, t2):
+    d = t2 - t1
+    return 86400*d.days + d.seconds + 1e-6*d.microseconds
+
+def stop_logging():
+    if _log_mode != 'profile':
+        return
+    ttime = duration(_log_starttime, datetime.datetime.now())
+    rtime = ttime - _log_subproctime
+    _logfile.info('Total time: %1.3f s' % ttime,
+                  'Time spent in subprocess calls: %1.3f s (%1.1f%%)'
+                  % (_log_subproctime, 100*_log_subproctime/ttime),
+                  'Remaining time: %1.3f s (%1.1f%%)'
+                  % (rtime, 100*rtime/ttime))
 
 class Run:
     exc = RunException
@@ -68,12 +88,16 @@ class Run:
             _logfile.start('Running subprocess %s' % self.__cmd)
             self.__starttime = datetime.datetime.now()
     def __log_end(self, retcode):
+        global _log_subproctime, _log_starttime
         if _log_mode == 'debug':
             _logfile.done('return code: %d' % retcode)
         elif _log_mode == 'profile':
-            duration = datetime.datetime.now() - self.__starttime
-            _logfile.done('%1.3f s' % (duration.microseconds/1e6
-                                       + duration.seconds))
+            n = datetime.datetime.now()
+            d = duration(self.__starttime, n)
+            _logfile.done('%1.3f s' % d)
+            _log_subproctime += d
+            _logfile.info('Time since program start: %1.3f s'
+                          % duration(_log_starttime, n))
     def __check_exitcode(self):
         if self.__good_retvals == None:
             return

^ permalink raw reply related

* [StGit PATCH 4/5] Log subprocess calls during performance testing
From: Karl Hasselström @ 2008-07-17 20:42 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: git
In-Reply-To: <20080717204133.23407.34264.stgit@yoghurt>

Log each command's subprocess calls to a separate file.

Signed-off-by: Karl Hasselström <kha@treskal.com>

---

 perf/perftest.py |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)


diff --git a/perf/perftest.py b/perf/perftest.py
index 7072772..e5ed04b 100644
--- a/perf/perftest.py
+++ b/perf/perftest.py
@@ -1,4 +1,4 @@
-import datetime, subprocess, sys
+import datetime, os, os.path, subprocess, sys
 
 def duration(t1, t2):
     d = t2 - t1
@@ -8,8 +8,16 @@ class Run(object):
     def __init__(self):
         self.__cwd = None
         self.__log = []
+    def __logfile(self, cmd):
+        fn = os.path.join(os.getcwd(), '%04d.log' % len(self.__log))
+        f = open(fn, 'w')
+        f.write(' '.join(cmd) + '\n' + '-'*70 + '\n\n')
+        f.close()
+        return fn
     def __call__(self, *cmd, **args):
-        kwargs = { 'cwd': self.__cwd }
+        env = dict(os.environ)
+        env['STGIT_SUBPROCESS_LOG'] = 'profile:' + self.__logfile(cmd)
+        kwargs = { 'cwd': self.__cwd, 'env': env }
         if args.get('capture_stdout', False):
             kwargs['stdout'] = subprocess.PIPE
         start = datetime.datetime.now()

^ permalink raw reply related

* [StGit PATCH 3/5] Show full command in subprocess profiling
From: Karl Hasselström @ 2008-07-17 20:42 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: git
In-Reply-To: <20080717204133.23407.34264.stgit@yoghurt>

Showing just the executable name isn't so useful now that it's always
"git".

Signed-off-by: Karl Hasselström <kha@treskal.com>

---

 stgit/run.py |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


diff --git a/stgit/run.py b/stgit/run.py
index 9d50e43..befd3c1 100644
--- a/stgit/run.py
+++ b/stgit/run.py
@@ -65,7 +65,7 @@ class Run:
                     if k not in os.environ or os.environ[k] != self.__env[k]:
                         _logfile.info('%s: %s' % (k, self.__env[k]))
         elif _log_mode == 'profile':
-            _logfile.start('Running subprocess %s' % self.__cmd[0])
+            _logfile.start('Running subprocess %s' % self.__cmd)
             self.__starttime = datetime.datetime.now()
     def __log_end(self, retcode):
         if _log_mode == 'debug':

^ permalink raw reply related

* [StGit PATCH 2/5] Log subproces activity to a file
From: Karl Hasselström @ 2008-07-17 20:42 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: git
In-Reply-To: <20080717204133.23407.34264.stgit@yoghurt>

If the user sets $STGIT_SUBPROCESS_LOG to a log mode followed by a
colon and a file name, append the log to that file instead of writing
it to stdout.

Signed-off-by: Karl Hasselström <kha@treskal.com>

---

 stgit/out.py |   11 +++++++----
 stgit/run.py |   35 +++++++++++++++++++++++------------
 2 files changed, 30 insertions(+), 16 deletions(-)


diff --git a/stgit/out.py b/stgit/out.py
index 485b830..753c176 100644
--- a/stgit/out.py
+++ b/stgit/out.py
@@ -20,7 +20,7 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 import sys, textwrap
 
 class MessagePrinter(object):
-    def __init__(self):
+    def __init__(self, file = None):
         class Output(object):
             def __init__(self, write, flush):
                 self.write = write
@@ -68,9 +68,12 @@ class MessagePrinter(object):
                 self.new_line()
                 self.write(string)
                 self.at_start_of_line = string.endswith('\n')
-        self.__stderr = Output(sys.stderr.write, sys.stderr.flush)
-        self.__stdout = Output(sys.stdout.write, sys.stdout.flush)
-        if sys.stdout.isatty():
+        if file:
+            self.__stdout = self.__stderr = Output(file.write, file.flush)
+        else:
+            self.__stdout = Output(sys.stdout.write, sys.stdout.flush)
+            self.__stderr = Output(sys.stdout.write, sys.stdout.flush)
+        if file or sys.stdout.isatty():
             self.__out = self.__stdout
             self.__err = self.__stdout
         else:
diff --git a/stgit/run.py b/stgit/run.py
index 0b79729..9d50e43 100644
--- a/stgit/run.py
+++ b/stgit/run.py
@@ -27,12 +27,22 @@ class RunException(StgException):
     subprocess."""
     pass
 
-_all_log_modes = ['debug', 'profile']
-_log_mode = os.environ.get('STGIT_SUBPROCESS_LOG', '')
-if _log_mode and not _log_mode in _all_log_modes:
-    out.warn(('Unknown log mode "%s" specified in $STGIT_SUBPROCESS_LOG.'
-              % _log_mode),
-             'Valid values are: %s' % ', '.join(_all_log_modes))
+def get_log_mode(spec):
+    if not ':' in spec:
+        spec += ':'
+    (log_mode, outfile) = spec.split(':', 1)
+    all_log_modes = ['debug', 'profile']
+    if log_mode and not log_mode in all_log_modes:
+        out.warn(('Unknown log mode "%s" specified in $STGIT_SUBPROCESS_LOG.'
+                  % log_mode),
+                 'Valid values are: %s' % ', '.join(all_log_modes))
+    if outfile:
+        f = MessagePrinter(open(outfile, 'a'))
+    else:
+        f = out
+    return (log_mode, f)
+
+(_log_mode, _logfile) = get_log_mode(os.environ.get('STGIT_SUBPROCESS_LOG', ''))
 
 class Run:
     exc = RunException
@@ -47,22 +57,23 @@ class Run:
         self.__discard_stderr = False
     def __log_start(self):
         if _log_mode == 'debug':
-            out.start('Running subprocess %s' % self.__cmd)
+            _logfile.start('Running subprocess %s' % self.__cmd)
             if self.__cwd != None:
-                out.info('cwd: %s' % self.__cwd)
+                _logfile.info('cwd: %s' % self.__cwd)
             if self.__env != None:
                 for k in sorted(self.__env.iterkeys()):
                     if k not in os.environ or os.environ[k] != self.__env[k]:
-                        out.info('%s: %s' % (k, self.__env[k]))
+                        _logfile.info('%s: %s' % (k, self.__env[k]))
         elif _log_mode == 'profile':
-            out.start('Running subprocess %s' % self.__cmd[0])
+            _logfile.start('Running subprocess %s' % self.__cmd[0])
             self.__starttime = datetime.datetime.now()
     def __log_end(self, retcode):
         if _log_mode == 'debug':
-            out.done('return code: %d' % retcode)
+            _logfile.done('return code: %d' % retcode)
         elif _log_mode == 'profile':
             duration = datetime.datetime.now() - self.__starttime
-            out.done('%1.3f s' % (duration.microseconds/1e6 + duration.seconds))
+            _logfile.done('%1.3f s' % (duration.microseconds/1e6
+                                       + duration.seconds))
     def __check_exitcode(self):
         if self.__good_retvals == None:
             return

^ permalink raw reply related

* [StGit PATCH 1/5] Add some performance testing scripts
From: Karl Hasselström @ 2008-07-17 20:42 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: git
In-Reply-To: <20080717204133.23407.34264.stgit@yoghurt>

find_patchbomb.py: Given a git repo, finds the longest linear sequence
  of commits. Useful for testing StGit on a real repository.

setup.sh: Creates two test repositories, one synthetic and one based
  on the Linux kernel repo, with strategically placed tags.

create_synthetic_repo.py: Helper script for setup.sh; it produces
  output that is to be fed to git fast-import.

perftest.py: Runs one of a (small) number of hard-coded performance
  tests against a copy of one of the repos created by setup.sh. The
  initial testcases all involve uncommitting a large number of patches
  and then rebasing them.

Signed-off-by: Karl Hasselström <kha@treskal.com>

---

 perf/.gitignore               |    2 +
 perf/create_synthetic_repo.py |   61 ++++++++++++++++++++++++++++
 perf/find_patchbomb.py        |   31 ++++++++++++++
 perf/perftest.py              |   88 +++++++++++++++++++++++++++++++++++++++++
 perf/setup.sh                 |   52 ++++++++++++++++++++++++
 5 files changed, 234 insertions(+), 0 deletions(-)
 create mode 100644 perf/.gitignore
 create mode 100644 perf/create_synthetic_repo.py
 create mode 100644 perf/find_patchbomb.py
 create mode 100644 perf/perftest.py
 create mode 100644 perf/setup.sh


diff --git a/perf/.gitignore b/perf/.gitignore
new file mode 100644
index 0000000..dfae110
--- /dev/null
+++ b/perf/.gitignore
@@ -0,0 +1,2 @@
+/*.orig
+/*.trash
diff --git a/perf/create_synthetic_repo.py b/perf/create_synthetic_repo.py
new file mode 100644
index 0000000..4d6ef6b
--- /dev/null
+++ b/perf/create_synthetic_repo.py
@@ -0,0 +1,61 @@
+next_mark = 1
+def get_mark():
+    global next_mark
+    next_mark += 1
+    return (next_mark - 1)
+
+def write_data(s):
+    print 'data %d' % len(s)
+    print s
+
+def write_blob(s):
+    print 'blob'
+    m = get_mark()
+    print 'mark :%d' % m
+    write_data(s)
+    return m
+
+def write_commit(branch, files, msg, parent = None):
+    print 'commit %s' % branch
+    m = get_mark()
+    print 'mark :%d' % m
+    auth = 'X Ample <xa@example.com> %d +0000' % (1000000000 + m)
+    print 'author %s' % auth
+    print 'committer %s' % auth
+    write_data(msg)
+    if parent != None:
+        print 'from :%d' % parent
+    for fn, fm in sorted(files.iteritems()):
+        print 'M 100644 :%d %s' % (fm, fn)
+    return m
+
+def set_ref(ref, mark):
+    print 'reset %s' % ref
+    print 'from :%d' % mark
+
+def stdblob(fn):
+    return ''.join('%d %s\n' % (x, fn) for x in xrange(10))
+
+def iter_paths():
+    for i in xrange(32):
+        for j in xrange(32):
+            for k in xrange(32):
+                yield '%02d/%02d/%02d' % (i, j, k)
+
+def setup():
+    def t(name): return 'refs/tags/%s' % name
+    files = dict((fn, write_blob(stdblob(fn))) for fn in iter_paths())
+    initial = write_commit(t('bomb-base'), files, 'Initial commit')
+    set_ref(t('bomb-top'), initial)
+    for fn in iter_paths():
+        write_commit(t('bomb-top'),
+                     { fn: write_blob(stdblob(fn) + 'Last line\n') },
+                     'Add last line to %s' % fn)
+    write_commit(t('add-file'), { 'woo-hoo.txt': write_blob('woo-hoo\n') },
+                 'Add a new file', parent = initial)
+    files = dict((fn, write_blob('First line\n' + stdblob(fn)))
+                 for fn in iter_paths())
+    write_commit(t('modify-all'), files, 'Add first line to all files',
+                 parent = initial)
+
+setup()
diff --git a/perf/find_patchbomb.py b/perf/find_patchbomb.py
new file mode 100644
index 0000000..69a78c7
--- /dev/null
+++ b/perf/find_patchbomb.py
@@ -0,0 +1,31 @@
+# Feed this with git rev-list HEAD --parents
+
+import sys
+
+parents = {}
+for line in sys.stdin.readlines():
+    commits = line.split()
+    parents[commits[0]] = commits[1:]
+
+sequence_num = {}
+stack = []
+for commit in parents.keys():
+    stack.append(commit)
+    while stack:
+        c = stack.pop()
+        if c in sequence_num:
+            continue
+        ps = parents[c]
+        if len(ps) == 1:
+            p = ps[0]
+            if p in sequence_num:
+                sequence_num[c] = 1 + sequence_num[p]
+            else:
+                stack.append(c)
+                stack.append(p)
+        else:
+            sequence_num[c] = 0
+
+(num, commit) = max((num, commit) for (commit, num)
+                    in sequence_num.iteritems())
+print '%s is a sequence of %d patches' % (commit, num)
diff --git a/perf/perftest.py b/perf/perftest.py
new file mode 100644
index 0000000..7072772
--- /dev/null
+++ b/perf/perftest.py
@@ -0,0 +1,88 @@
+import datetime, subprocess, sys
+
+def duration(t1, t2):
+    d = t2 - t1
+    return 86400*d.days + d.seconds + 1e-6*d.microseconds
+
+class Run(object):
+    def __init__(self):
+        self.__cwd = None
+        self.__log = []
+    def __call__(self, *cmd, **args):
+        kwargs = { 'cwd': self.__cwd }
+        if args.get('capture_stdout', False):
+            kwargs['stdout'] = subprocess.PIPE
+        start = datetime.datetime.now()
+        p = subprocess.Popen(cmd, **kwargs)
+        (out, err) = p.communicate()
+        stop = datetime.datetime.now()
+        self.__log.append((cmd, duration(start, stop)))
+        return out
+    def cd(self, dir):
+        self.__cwd = dir
+    def summary(self):
+        def pcmd(c): return ' '.join(c)
+        def ptime(t): return '%.3f' % t
+        (cs, times) = zip(*self.__log)
+        ttime = sum(times)
+        cl = max(len(pcmd(c)) for c in cs)
+        tl = max(len(ptime(t)) for t in list(times) + [ttime])
+        for (c, t) in self.__log:
+            print '%*s  %*s' % (tl, ptime(t), -cl, pcmd(c))
+        print '%*s' % (tl, ptime(ttime))
+
+perftests = {}
+perftestdesc = {}
+def perftest(desc, name = None):
+    def decorator(f):
+        def g():
+            r = Run()
+            f(r)
+            r.summary()
+        perftests[name or f.__name__] = g
+        perftestdesc[name or f.__name__] = desc
+        return g
+    return decorator
+
+def copy_testdir(dir):
+    tmp = dir + '.trash'
+    r = Run()
+    r('rsync', '-a', '--delete', dir + '.orig/', tmp)
+    return tmp
+
+def new_rebase(r, ref):
+    top = r('stg', 'top', capture_stdout = True)
+    r('stg', 'pop', '-a')
+    r('git', 'reset', '--hard', ref)
+    r('stg', 'goto', top.strip())
+
+def old_rebase(r, ref):
+    r('stg', 'rebase', ref)
+
+def def_rebasetest(rebase, dir, tag):
+    @perftest('%s rebase onto %s in %s' % (rebase, tag, dir),
+              'rebase-%srebase-%s-%s' % (rebase, tag, dir))
+    def rebasetest(r):
+        r.cd(copy_testdir(dir))
+        r('stg', 'init')
+        if dir == 'synt':
+            r('stg', 'uncommit', '-n', '500')
+        else:
+            r('stg', 'uncommit', '-x', '-t', 'bomb-base')
+        if rebase == 'new':
+            new_rebase(r, tag)
+        else:
+            old_rebase(r, tag)
+for rebase in ['old', 'new']:
+    for (dir, tag) in [('synt', 'add-file'),
+                       ('synt', 'modify-all'),
+                       ('linux', 'add-file')]:
+        def_rebasetest(rebase, dir, tag)
+
+args = sys.argv[1:]
+if len(args) == 0:
+    for (fun, desc) in sorted(perftestdesc.iteritems()):
+        print '%s: %s' % (fun, desc)
+else:
+    for test in args:
+        perftests[test]()
diff --git a/perf/setup.sh b/perf/setup.sh
new file mode 100644
index 0000000..b92ddfc
--- /dev/null
+++ b/perf/setup.sh
@@ -0,0 +1,52 @@
+krepo='git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git'
+
+get_linux() {
+    rm -rf linux.orig
+    git clone "$krepo" linux.orig
+}
+
+mod_linux() {
+    # Tag the top and base of a very long linear sequence of commits.
+    git tag bomb-top 85040bcb4643cba578839e953f25e2d1965d83d0
+    git tag bomb-base bomb-top~1470
+
+    # Add a file at the base of the linear sequence.
+    git checkout bomb-base
+    echo "woo-hoo" > woo-hoo.txt
+    git add woo-hoo.txt
+    git commit -m "Add a file"
+    git tag add-file
+
+    # Clean up and go to start position.
+    git gc
+    git update-ref refs/heads/master bomb-top
+    git checkout master
+}
+
+setup_linux () {
+    get_linux
+    ( cd linux.orig && mod_linux )
+}
+
+create_empty () {
+    dir="$1"
+    rm -rf $dir
+    mkdir $dir
+    ( cd $dir && git init )
+}
+
+fill_synthetic () {
+    python ../create_synthetic_repo.py | git fast-import
+    git gc --aggressive
+    git update-ref refs/heads/master bomb-top
+    git checkout master
+}
+
+setup_synthetic()
+{
+    create_empty synt.orig
+    ( cd synt.orig && fill_synthetic )
+}
+
+setup_linux
+setup_synthetic

^ permalink raw reply related

* [StGit PATCH 0/5] Performance testing tools
From: Karl Hasselström @ 2008-07-17 20:42 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: git

Here's the same scripts I posted earlier, but polished and extended.

---

Karl Hasselström (5):
      Global performance logging
      Log subprocess calls during performance testing
      Show full command in subprocess profiling
      Log subproces activity to a file
      Add some performance testing scripts


 perf/.gitignore               |    2 +
 perf/create_synthetic_repo.py |   61 ++++++++++++++++++++++++++
 perf/find_patchbomb.py        |   31 +++++++++++++
 perf/perftest.py              |   96 +++++++++++++++++++++++++++++++++++++++++
 perf/setup.sh                 |   52 ++++++++++++++++++++++
 stgit/main.py                 |   11 ++++-
 stgit/out.py                  |   11 +++--
 stgit/run.py                  |   61 +++++++++++++++++++++-----
 8 files changed, 306 insertions(+), 19 deletions(-)
 create mode 100644 perf/.gitignore
 create mode 100644 perf/create_synthetic_repo.py
 create mode 100644 perf/find_patchbomb.py
 create mode 100644 perf/perftest.py
 create mode 100644 perf/setup.sh

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply

* Re: Considering teaching plumbing to users harmful
From: Kevin Ballard @ 2008-07-17 20:42 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: David Kastrup, git
In-Reply-To: <200807172234.19146.jnareb@gmail.com>

On Jul 17, 2008, at 1:34 PM, Jakub Narebski wrote:

> On Thu, 17 July 2008, Kevin Ballard wrote:
>> On Jul 17, 2008, at 1:04 PM, Jakub Narebski wrote:
>>
>>> Git submodules are roughly equivalent to svn:externals with peg
>>> revisions; I mean here that they refer not to some branch in some
>>> external repository, but to specific revision.  This is the only  
>>> sane
>>> design, as it assures that when checking out some historical  
>>> revision,
>>> the state that is checked out will be the same for everybody.
>>>
>>> Please take into account however that submodules are quite new
>>> feature, and while underlying engine is solid, interface (UI) needs
>>> some polishing (and use cases).
>>
>> There is one facet of submodules that annoys me, because it prevents
>> me from using them as a replacement for svn:externals. Namely, the
>> submodule refers to a specific repository, but not a path within that
>> repository. I work with svn repos that use svn:externals to peg
>> revisions (as is appropriate) but they all refer to various paths
>> within the other repositories, and the only way I can deal with that
>> is to throw symlinks everywhere.
>
> I don't quite understand.  At the lowest, "gitlink" level submodule
> entry is just having _commit_ object in place of directory.  And of
> course this commit object refers to top tree (top directory) in
> a subproject.
>
> If you have subproject B with the following file structure
>
>  B/foo
>  B/bar/baz
>
> and you have (super)project A, which contains B as subproject at path
> sub-b, and has some files itself, the directory sytucture would look
> like this:
>
>  A/quux
>  A/sub-b/foo
>  A/sub-b/bar/baz
>
>
> What you want, I guess, is some a bit weird for me mixture of  
> submodule
> and partial (subtree) checkout... and the latter is not implemented  
> yet
> (I say "yet" because there was some preliminary implementation of
> subtree checkout on git mailing list).

It seems you understand what I'm saying. The only way I can mimic it  
is to make the submodules actually live in some hidden directory .foo  
and then scatter symlinks everywhere.

-Kevin Ballard

-- 
Kevin Ballard
http://kevin.sb.org
kevin@sb.org
http://www.tildesoft.com

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox