Git development
 help / color / mirror / Atom feed
* Re: After-the-fact submodule detection or creation
From: Michael Poole @ 2007-12-07 21:35 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git
In-Reply-To: <20071207073728.GA2847@steel.home>

Alex Riesen writes:

> Michael Poole, Fri, Dec 07, 2007 04:01:04 +0100:
>> It seems like using the current submodule code would mean that this
>> kind of import would need two passes over the foreign repository,
>> rather than one if the branch could be created after the parent tree
>> is initially imported.  I can live with that -- it is a rather unusual
>> case -- but maybe there is a better way.)
>
> Import the core module in a branch all by itself, and merge it in
> every support branch?
>
>
>     Supp1: o-o-o-----o-o-o-o-o-o-o
> 		    /
>     Core:  o-o-o-o-o
> 		    \
>     Supp2: o-o-------o-o-o-o

Yes, that's the obvious way to do it with submodules.  Teaching
git-svn to use that is the hard part.

Since the core code was first branched independently at r734 in the
existing repository, the import (either automated or manual) would
need to go through once to identify what subdirectories are actually
submodules in git terminology, and make a second pass to actually
perform the imports.  If the submodule creation could happen after the
fact, it would only need one pass.

Maybe the right question to ask is whether having a partial-tree
branch can be reasonably handled by git (in particular, detecting a
rename of the core subtree to the top-level tree in the new branch's
first commit).  If git understand that operation, then what I would
like to do would be reasonably straightforward.  If it does not make
sense, then I'll think about how to teach git-svn that certain
subdirectories should be promoted to submodules.

Michael Poole

^ permalink raw reply

* Re: git-bisect feature suggestion: "git-bisect diff"
From: Jeff King @ 2007-12-07 21:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ingo Molnar, git
In-Reply-To: <20071207213414.GA11688@coredump.intra.peff.net>

On Fri, Dec 07, 2007 at 04:34:14PM -0500, Jeff King wrote:

> On Fri, Dec 07, 2007 at 02:25:34AM -0800, Junio C Hamano wrote:
> 
> > git-bisect visualize: work in non-windowed environments better
> 
> Isn't this more or less the use case for the "git view" alias?

Which isn't to say that I don't think your solution is nicer; it is. But
if we don't use it here, then perhaps "git view" really is a solution in
search of a problem.

-Peff

^ permalink raw reply

* Re: git-bisect feature suggestion: "git-bisect diff"
From: Jeff King @ 2007-12-07 21:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ingo Molnar, git
In-Reply-To: <7vwsrq3iox.fsf@gitster.siamese.dyndns.org>

On Fri, Dec 07, 2007 at 02:25:34AM -0800, Junio C Hamano wrote:

> git-bisect visualize: work in non-windowed environments better

Isn't this more or less the use case for the "git view" alias?

diff --git a/git-bisect.sh b/git-bisect.sh
index 7a6521e..3a21386 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -325,7 +325,7 @@ bisect_next() {
 bisect_visualize() {
 	bisect_next_check fail
 	not=$(git for-each-ref --format='%(refname)' "refs/bisect/good-*")
-	eval gitk refs/bisect/bad --not $not -- $(cat "$GIT_DIR/BISECT_NAMES")
+	eval git view refs/bisect/bad --not $not -- $(cat "$GIT_DIR/BISECT_NAMES")
 }
 
 bisect_reset() {

^ permalink raw reply related

* Re: RAM consumption when working with the gcc repo
From: Jon Smirl @ 2007-12-07 21:27 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: david, Git Mailing List
In-Reply-To: <alpine.LFD.0.99999.0712071529580.555@xanadu.home>

On 12/7/07, Nicolas Pitre <nico@cam.org> wrote:
> On Fri, 7 Dec 2007, david@lang.hm wrote:
>
> > On Fri, 7 Dec 2007, Jon Smirl wrote:
> >
> > > I noticed two things when doing a repack of the gcc repo. First is
> > > that the git process is getting to be way too big. Turning off the
> > > delta caches had minimal impact. Why does the process still grow to
> > > 4.8GB?
> > >
> > > Putting this in perspective, this is a 4.8GB process constructing a
> > > 330MB file. Something isn't right. Memory leak or inefficient data
> > > structure?
> >
> > keep in mind that that 330MB file is _very_ heavily compressed. the simple
> > zlib compression is probably getting you 10:1 or 20:1 compression and the
> > delta compression is a significant multiplier on top of that.
>
> Doesn't matter.  Something is indeed fishy.

I didn't have any problem repacking Mozilla and it ends up as a 450MB
pack file with 1.5M entries.  So something has changed. With Mozilla I
had a 3GB machine, and now I can't finish a 330MB pack on a 4GB
machine. I don't recall the Mozilla process ever exceeding 2GB.


>
> The bulk of pack-objects memory consumption can be estimated as follows:
>
> 1M objects * sizeof(struct object_entry) ~= 100MB
> 256 window entries with data (assuming a big 1MB per entry) = 256MB
> Delta result caching was disabled therefore 0MB
> read-side delta cache limited to 16MB
>
> So the purely ram allocation might get to roughly 400MB.
>
> Then add the pack and index map, which, depending on the original pack
> size,
> might be 2GB.
>
> So we're pessimistically talking of about 2.5GB of virtual space.
>
> The other 2.3GB is hard to explain.
>
>
> Nicolas
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* [PATCH] add status.relativePaths config variable
From: Jeff King @ 2007-12-07 21:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, Thomas Harning, git
In-Reply-To: <20071207204937.GA20111@coredump.intra.peff.net>

The output of git-status was recently changed to output
relative paths. Setting this variable to false restores the
old behavior for any old-timers that prefer it.

Signed-off-by: Jeff King <peff@peff.net>
---
On Fri, Dec 07, 2007 at 03:49:37PM -0500, Jeff King wrote:

> Personally, I don't like either the "../" or the "./", but I actually
> think the relative paths are less readable than the full paths in
> general.

So here is a config option to turn it off; I don't think there should be
any consistency problems, since git-status output is meant to be
human-readable (and after all, we just changed it :) ).

This patch also contains a small buglet fix in the neighboring code
where we didn't stop trying to match "color.status.*" even after we used
it to set the status color.

 Documentation/config.txt |    6 ++++++
 builtin-commit.c         |    3 +--
 builtin-revert.c         |    2 +-
 t/t7502-status.sh        |   31 +++++++++++++++++++++++++++++++
 wt-status.c              |   10 +++++++++-
 wt-status.h              |    2 +-
 6 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index f0ffb9d..fabe7f8 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -776,6 +776,12 @@ showbranch.default::
 	The default set of branches for gitlink:git-show-branch[1].
 	See gitlink:git-show-branch[1].
 
+status.relativePaths::
+	By default, gitlink:git-status[1] shows paths relative to the
+	current directory. Setting this variable to `false` shows paths
+	relative to the repository root (this was the default for git
+	prior to v1.5.4).
+
 tar.umask::
 	This variable can be used to restrict the permission bits of
 	tar archive entries.  The default is 0002, which turns off the
diff --git a/builtin-commit.c b/builtin-commit.c
index 18c6323..04b3bf1 100644
--- a/builtin-commit.c
+++ b/builtin-commit.c
@@ -284,8 +284,7 @@ static int run_status(FILE *fp, const char *index_file, const char *prefix)
 {
 	struct wt_status s;
 
-	wt_status_prepare(&s);
-	s.prefix = prefix;
+	wt_status_prepare(&s, prefix);
 
 	if (amend) {
 		s.amend = 1;
diff --git a/builtin-revert.c b/builtin-revert.c
index 4bf8eb2..c285f8e 100644
--- a/builtin-revert.c
+++ b/builtin-revert.c
@@ -277,7 +277,7 @@ static int revert_or_cherry_pick(int argc, const char **argv)
 
 		if (get_sha1("HEAD", head))
 			die ("You do not have a valid HEAD");
-		wt_status_prepare(&s);
+		wt_status_prepare(&s, NULL);
 		if (s.commitable)
 			die ("Dirty index: cannot %s", me);
 		discard_cache();
diff --git a/t/t7502-status.sh b/t/t7502-status.sh
index d6ae69d..9ce50ca 100755
--- a/t/t7502-status.sh
+++ b/t/t7502-status.sh
@@ -88,4 +88,35 @@ test_expect_success 'status with relative paths' '
 
 '
 
+cat > expect << \EOF
+# On branch master
+# Changes to be committed:
+#   (use "git reset HEAD <file>..." to unstage)
+#
+#	new file:   dir2/added
+#
+# Changed but not updated:
+#   (use "git add <file>..." to update what will be committed)
+#
+#	modified:   dir1/modified
+#
+# Untracked files:
+#   (use "git add <file>..." to include in what will be committed)
+#
+#	dir1/untracked
+#	dir2/modified
+#	dir2/untracked
+#	expect
+#	output
+#	untracked
+EOF
+
+test_expect_success 'status without relative paths' '
+
+	git config status.relativePaths false
+	(cd dir1 && git status) > output &&
+	git diff expect output
+
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index 02dbb75..b21b2c4 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -8,6 +8,7 @@
 #include "revision.h"
 #include "diffcore.h"
 
+int wt_status_relative_paths = 1;
 int wt_status_use_color = 0;
 static char wt_status_colors[][COLOR_MAXLEN] = {
 	"",         /* WT_STATUS_HEADER: normal */
@@ -42,7 +43,7 @@ static const char* color(int slot)
 	return wt_status_use_color ? wt_status_colors[slot] : "";
 }
 
-void wt_status_prepare(struct wt_status *s)
+void wt_status_prepare(struct wt_status *s, const char *prefix)
 {
 	unsigned char sha1[20];
 	const char *head;
@@ -53,6 +54,8 @@ void wt_status_prepare(struct wt_status *s)
 	s->reference = "HEAD";
 	s->fp = stdout;
 	s->index_file = get_index_file();
+	if (wt_status_relative_paths)
+		s->prefix = prefix;
 }
 
 static void wt_status_print_cached_header(struct wt_status *s)
@@ -397,6 +400,11 @@ int git_status_config(const char *k, const char *v)
 	if (!prefixcmp(k, "status.color.") || !prefixcmp(k, "color.status.")) {
 		int slot = parse_status_slot(k, 13);
 		color_parse(v, k, wt_status_colors[slot]);
+		return 0;
+	}
+	if (!strcmp(k, "status.relativepaths")) {
+		wt_status_relative_paths = git_config_bool(k, v);
+		return 0;
 	}
 	return git_default_config(k, v);
 }
diff --git a/wt-status.h b/wt-status.h
index 225fb4d..0ed94f3 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -28,7 +28,7 @@ struct wt_status {
 
 int git_status_config(const char *var, const char *value);
 int wt_status_use_color;
-void wt_status_prepare(struct wt_status *s);
+void wt_status_prepare(struct wt_status *s, const char *prefix);
 void wt_status_print(struct wt_status *s);
 
 #endif /* STATUS_H */
-- 
1.5.3.7.2159.gde63a-dirty

^ permalink raw reply related

* Re: RAM consumption when working with the gcc repo
From: Marco Costalba @ 2007-12-07 21:25 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: david, Jon Smirl, Git Mailing List
In-Reply-To: <alpine.LFD.0.99999.0712071529580.555@xanadu.home>

On Dec 7, 2007 9:46 PM, Nicolas Pitre <nico@cam.org> wrote:
>
> The other 2.3GB is hard to explain.
>

BTW does exist a tool to profile memory consumption by each source
level struct / vector/ or any other data container?

Valgrind checks mainly memory leaks, callgrind gives profiling
information in terms of call graphs and times/cycles consumed by each
function.

What I _really_ would like it's a tool that allows me to *easily*
check how much memory is used in a given point in time by each data
container at source level.

Something like this:


At checkpoint "trigger_now":

struct my_data is instantiated 120234 times
struct super_delta is instantiated 100000 times


At checkpoint "trigger_also_now":

struct my_data is instantiated 12 times
struct super_delta is instantiated 70 times

.....


That would be AWSOME!!! a super debugging killing tool!

Thanks
Marco

^ permalink raw reply

* Re: [RFC/PATCH] Add a --nosort option to pack-objects
From: Junio C Hamano @ 2007-12-07 21:25 UTC (permalink / raw)
  To: Mike Hommey; +Cc: git
In-Reply-To: <1197061832-8489-1-git-send-email-mh@glandium.org>

Mike Hommey <mh@glandium.org> writes:

> The --nosort option disabled the internal sorting used by pack-objects,
> and runs the sliding window along the object list litterally as given on
> stdin.

I think this is a good way to give people an easier way to experiment.

But it makes me wonder if this is disabling too much, and if the list
should be sorted at least by type, as we won't delta different types of
objects against each other.

At the beginning of try_delta(), when we see that the next candidate is
of a different type, we return -1 telling the caller that "No object in
the window will ever be a good delta base for the current object, please
abort".  This relies on the fact that we sort by type first, so I think
one of the following is necessary:
 
 (1) you weaken this check (return 0, saying "This did not delta well but
     do not give up yet"),

 (2) you document this well so that --nosort user will know, or

 (3) you sort --nosort input by type.

>   I would obviously add the appropriate documentation for this flag if this
>   is accepted. I'll also try to send another documentation patch for
>   pack-objects with some information compiled from Linus's explanation to my
>   last message about pack-objects.

I need to rant here a bit.

Sometimes people say "Here is my patch.  If this is accepted, I'll add
documentation and tests".  My reaction is, "Don't you, as the person who
proposes that change, believe in your patch deeply enough yourself to be
willing to perfect it, to make it suitable for consumption by the
general public, whether it is included in my tree or not?  A change that
even you do not believe in deeply enough probably to perfect would not
benefit the general public, so thanks but no thanks, I'll pass."

Fortunately we haven't had this problem too many times on this list.

I would not have minded at all if you said:

	Obviously, appropriate documentation and tests are needed before
	inclusion, but I am sending this out primarily to seek opinions
	from the list to make sure this is going in the right direction,
	iow, this is an RFC.

What bugged me was the phrase "if this is accepted".

^ permalink raw reply

* Re: [RFC/PATCH] Add a --nosort option to pack-objects
From: Nicolas Pitre @ 2007-12-07 21:24 UTC (permalink / raw)
  To: Mike Hommey; +Cc: git, Junio C Hamano
In-Reply-To: <1197061832-8489-1-git-send-email-mh@glandium.org>

On Fri, 7 Dec 2007, Mike Hommey wrote:

> While most of the time the heuristics used by pack-objects to sort the
> given object list are satisfying enough, there are cases where it can be
> useful for the user to sort the list with heuristics that would be better
> suited.

Could you please elaborate on those cases where the current heuristic 
would be unsatisfactory?


Nicolas

^ permalink raw reply

* Re: RAM consumption when working with the gcc repo
From: Jon Smirl @ 2007-12-07 21:23 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: david, Git Mailing List
In-Reply-To: <alpine.LFD.0.99999.0712071529580.555@xanadu.home>

On 12/7/07, Nicolas Pitre <nico@cam.org> wrote:
> On Fri, 7 Dec 2007, david@lang.hm wrote:
>
> > On Fri, 7 Dec 2007, Jon Smirl wrote:
> >
> > > I noticed two things when doing a repack of the gcc repo. First is
> > > that the git process is getting to be way too big. Turning off the
> > > delta caches had minimal impact. Why does the process still grow to
> > > 4.8GB?
> > >
> > > Putting this in perspective, this is a 4.8GB process constructing a
> > > 330MB file. Something isn't right. Memory leak or inefficient data
> > > structure?
> >
> > keep in mind that that 330MB file is _very_ heavily compressed. the simple
> > zlib compression is probably getting you 10:1 or 20:1 compression and the
> > delta compression is a significant multiplier on top of that.
>
> Doesn't matter.  Something is indeed fishy.
>
> The bulk of pack-objects memory consumption can be estimated as follows:
>
> 1M objects * sizeof(struct object_entry) ~= 100MB
> 256 window entries with data (assuming a big 1MB per entry) = 256MB
> Delta result caching was disabled therefore 0MB
> read-side delta cache limited to 16MB
>
> So the purely ram allocation might get to roughly 400MB.
>
> Then add the pack and index map, which, depending on the original pack
> size,
> might be 2GB.

I'm repacking the heavily compress pack, so input pack and index are
about 360MB, not 2GB.

>
> So we're pessimistically talking of about 2.5GB of virtual space.
>
> The other 2.3GB is hard to explain.

More like 3.5MB that is hard to explain.

Is there a simple way to tell what percent is mmap vs anon allocation?


>
>
> Nicolas
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* [RFC/PATCH] Add a --nosort option to pack-objects
From: Mike Hommey @ 2007-12-07 21:10 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

While most of the time the heuristics used by pack-objects to sort the
given object list are satisfying enough, there are cases where it can be
useful for the user to sort the list with heuristics that would be better
suited.

The --nosort option disabled the internal sorting used by pack-objects,
and runs the sliding window along the object list litterally as given on
stdin.

Signed-off-by: Mike Hommey <mh@glandium.org>
---

  I would obviously add the appropriate documentation for this flag if this
  is accepted. I'll also try to send another documentation patch for
  pack-objects with some information compiled from Linus's explanation to my
  last message about pack-objects.

 builtin-pack-objects.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 4f44658..8bc2d5f 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -21,7 +21,7 @@
 
 static const char pack_usage[] = "\
 git-pack-objects [{ -q | --progress | --all-progress }] \n\
-	[--max-pack-size=N] [--local] [--incremental] \n\
+	[--max-pack-size=N] [--local] [--incremental] [--nosort]\n\
 	[--window=N] [--window-memory=N] [--depth=N] \n\
 	[--no-reuse-delta] [--no-reuse-object] [--delta-base-offset] \n\
 	[--threads=N] [--non-empty] [--revs [--unpacked | --all]*] [--reflog] \n\
@@ -64,6 +64,7 @@ static int non_empty;
 static int no_reuse_delta, no_reuse_object, keep_unreachable;
 static int local;
 static int incremental;
+static int nosort;
 static int allow_ofs_delta;
 static const char *base_name;
 static int progress = 1;
@@ -1715,7 +1716,9 @@ static void prepare_pack(int window, int depth)
 		if (progress)
 			progress_state = start_progress("Compressing objects",
 							nr_deltas);
-		qsort(delta_list, n, sizeof(*delta_list), type_size_sort);
+		if (! nosort)
+			qsort(delta_list, n, sizeof(*delta_list),
+				type_size_sort);
 		ll_find_deltas(delta_list, n, window+1, depth, &nr_done);
 		stop_progress(&progress_state);
 		if (nr_done != nr_deltas)
@@ -1988,6 +1991,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			incremental = 1;
 			continue;
 		}
+		if (!strcmp("--nosort", arg)) {
+			nosort = 1;
+			continue;
+		}
 		if (!prefixcmp(arg, "--compression=")) {
 			char *end;
 			int level = strtoul(arg+14, &end, 0);
-- 
1.5.3.7

^ permalink raw reply related

* Re: [PATCH] quote_path: convert empty path to "./"
From: Jeff King @ 2007-12-07 20:49 UTC (permalink / raw)
  To: Thomas Harning; +Cc: Johannes Schindelin, Junio C Hamano, git
In-Reply-To: <4759996B.2000300@gmail.com>

On Fri, Dec 07, 2007 at 02:05:15PM -0500, Thomas Harning wrote:

> I concur.  There is one case that this seems to dodge.  What about the case 
> where you are in:
>
> /test/test_2  where /test  is not tracked...
>
> This should probably show "./../"   not just "./"   , right?

It already says "../", which is correct:

  $ git init
  $ mkdir test && cd test
  $ touch file
  $ mkdir test2 && cd test2
  $ git status
  ...
  # Untracked files:
  #   (use "git add <file>..." to include in what will be committed)
  #
  #       ../

There's no point in ever saying "./" _except_ in the case where the
output would be totally blank, since there is no way to tell that it is
an output line.

Personally, I don't like either the "../" or the "./", but I actually
think the relative paths are less readable than the full paths in
general.

-Peff

^ permalink raw reply

* Re: RAM consumption when working with the gcc repo
From: Nicolas Pitre @ 2007-12-07 20:46 UTC (permalink / raw)
  To: david; +Cc: Jon Smirl, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0712071323260.12607@asgard.lang.hm>

On Fri, 7 Dec 2007, david@lang.hm wrote:

> On Fri, 7 Dec 2007, Jon Smirl wrote:
> 
> > I noticed two things when doing a repack of the gcc repo. First is
> > that the git process is getting to be way too big. Turning off the
> > delta caches had minimal impact. Why does the process still grow to
> > 4.8GB?
> > 
> > Putting this in perspective, this is a 4.8GB process constructing a
> > 330MB file. Something isn't right. Memory leak or inefficient data
> > structure?
> 
> keep in mind that that 330MB file is _very_ heavily compressed. the simple
> zlib compression is probably getting you 10:1 or 20:1 compression and the
> delta compression is a significant multiplier on top of that.

Doesn't matter.  Something is indeed fishy.

The bulk of pack-objects memory consumption can be estimated as follows:

1M objects * sizeof(struct object_entry) ~= 100MB
256 window entries with data (assuming a big 1MB per entry) = 256MB
Delta result caching was disabled therefore 0MB
read-side delta cache limited to 16MB

So the purely ram allocation might get to roughly 400MB.

Then add the pack and index map, which, depending on the original pack 
size,
might be 2GB.

So we're pessimistically talking of about 2.5GB of virtual space.

The other 2.3GB is hard to explain.


Nicolas

^ permalink raw reply

* Re: RAM consumption when working with the gcc repo
From: Marco Costalba @ 2007-12-07 20:36 UTC (permalink / raw)
  To: david; +Cc: Jon Smirl, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0712071323260.12607@asgard.lang.hm>

On Dec 7, 2007 10:24 PM,  <david@lang.hm> wrote:
> On Fri, 7 Dec 2007, Jon Smirl wrote:
>
> keep in mind that that 330MB file is _very_ heavily compressed. the simple
> zlib compression is probably getting you 10:1 or 20:1 compression and the
> delta compression is a significant multiplier on top of that.
>

If the delta is good the zlib is poor and the contrary stands too.

It is very difficult to _guess_ in this cases.

Marco

^ permalink raw reply

* Re: Git and GCC
From: Giovanni Bajo @ 2007-12-07 20:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, jonsmirl, peff, nico, dberlin, harvey.harrison,
	ismail, gcc, git
In-Reply-To: <alpine.LFD.0.9999.0712070919590.7274@woody.linux-foundation.org>

On 12/7/2007 6:23 PM, Linus Torvalds wrote:

>> Is SHA a significant portion of the compute during these repacks?
>> I should run oprofile...
> 
> SHA1 is almost totally insignificant on x86. It hardly shows up. But we 
> have a good optimized version there.
> 
> zlib tends to be a lot more noticeable (especially the uncompression: it 
> may be faster than compression, but it's done _so_ much more that it 
> totally dominates).

Have you considered alternatives, like:
http://www.oberhumer.com/opensource/ucl/
-- 
Giovanni Bajo

^ permalink raw reply

* Re: RAM consumption when working with the gcc repo
From: david @ 2007-12-07 21:24 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List
In-Reply-To: <9e4733910712071207p750c14f4h7abc5d637da3a478@mail.gmail.com>

On Fri, 7 Dec 2007, Jon Smirl wrote:

> I noticed two things when doing a repack of the gcc repo. First is
> that the git process is getting to be way too big. Turning off the
> delta caches had minimal impact. Why does the process still grow to
> 4.8GB?
>
> Putting this in perspective, this is a 4.8GB process constructing a
> 330MB file. Something isn't right. Memory leak or inefficient data
> structure?

keep in mind that that 330MB file is _very_ heavily compressed. the simple 
zlib compression is probably getting you 10:1 or 20:1 compression and the 
delta compression is a significant multiplier on top of that.

David Lang

> The second issue is that the repack process slows way down on the last
> 10% of the packing process. I don't believe this was caused by
> swapping since my disk light wasn't on. It takes a long to do the last
> 10% as it did for the first 70%. This seems to be correlated with the
> size of the process getting so large.
>
>

^ permalink raw reply

* Re: git guidance
From: david @ 2007-12-07 21:17 UTC (permalink / raw)
  To: Al Boldi
  Cc: Andreas Ericsson, Johannes Schindelin, Phillip Susi,
	Linus Torvalds, Jing Xue, linux-kernel, git
In-Reply-To: <200712071353.11654.a1426z@gawab.com>

On Fri, 7 Dec 2007, Al Boldi wrote:

> Andreas Ericsson wrote:
>> So, to get to the bottom of this, which of the following workflows is it
>> you want git to support?
>>
>> ### WORKFLOW A ###
>> edit, edit, edit
>> edit, edit, edit
>> edit, edit, edit
>> Oops I made a mistake and need to hop back to "current - 12".
>> edit, edit, edit
>> edit, edit, edit
>> publish everything, similar to just tarring up your workdir and sending
>> out ### END WORKFLOW A ###
>>
>> ### WORKFLOW B ###
>> edit, edit, edit
>> ok this looks good, I want to save a checkpoint here
>> edit, edit, edit
>> looks good again. next checkpoint
>> edit, edit, edit
>> oh crap, back to checkpoint 2
>> edit, edit, edit
>> ooh, that's better. save a checkpoint and publish those checkpoints
>> ### END WORKFLOW B ###
>
> ### WORKFLOW C ###
> for every save on a gitfs mounted dir, do an implied checkpoint, commit, or
> publish (should be adjustable), on its privately created on-the-fly
> repository.
> ### END WORKFLOW C ###
>
> For example:
>
>  echo "// last comment on this file" >> /gitfs.mounted/file
>
> should do an implied checkpoint, and make these checkpoints immediately
> visible under some checkpoint branch of the gitfs mounted dir.
>
> Note, this way the developer gets version control without even noticing, and
> works completely transparent to any kind of application.

so if you have a script that does

echo "mail header" >tmpfile
echo "subject: >>tmpfile
echo >>tmpfile
echo "body" >>tmpfile

you want to have four seperate commits

what if you have a perl script

open outfile ">tmpfile";
print outfile "mail header\n";
print outfile "subject:\n\n";
print outfile "body\n";
close ourfile;

how many seperate commits do you think should take place?

what if $|=1 (unbuffered output, so that each print statement becomes 
visable to other programs immediatly)?

what if the file is changed via mmap? should each byte/word written to 
memory be a commit? or when the mmap is closed? or when the kernel happens 
to flush the page to disk?

'recording every change to a filesystem' is a very incomplete definition 
of a goal.

David Lang

^ permalink raw reply

* RAM consumption when working with the gcc repo
From: Jon Smirl @ 2007-12-07 20:07 UTC (permalink / raw)
  To: Git Mailing List

I noticed two things when doing a repack of the gcc repo. First is
that the git process is getting to be way too big. Turning off the
delta caches had minimal impact. Why does the process still grow to
4.8GB?

Putting this in perspective, this is a 4.8GB process constructing a
330MB file. Something isn't right. Memory leak or inefficient data
structure?

The second issue is that the repack process slows way down on the last
10% of the packing process. I don't believe this was caused by
swapping since my disk light wasn't on. It takes a long to do the last
10% as it did for the first 70%. This seems to be correlated with the
size of the process getting so large.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: git-bisect feature suggestion: "git-bisect diff"
From: Ingo Molnar @ 2007-12-07 19:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vmysm1eyz.fsf@gitster.siamese.dyndns.org>


* Junio C Hamano <gitster@pobox.com> wrote:

> Ingo Molnar <mingo@elte.hu> writes:
> 
> > ... One small detail though: i frequently ssh to testboxes that have 
> > DISPLAY set but i want text output. So git-bisect view --text should be 
> > a special-case perhaps?
> 
> Yeah, but at that point, wouldn't "git bisect view log" be shorter to 
> type?

it's also more intuitive. ok :-)

	Ingo

^ permalink raw reply

* Re: git guidance
From: Valdis.Kletnieks @ 2007-12-07 19:36 UTC (permalink / raw)
  To: Al Boldi
  Cc: Jakub Narebski, Andreas Ericsson, Johannes Schindelin,
	Phillip Susi, Linus Torvalds, Jing Xue, linux-kernel, git
In-Reply-To: <200712072204.48410.a1426z@gawab.com>

[-- Attachment #1: Type: text/plain, Size: 505 bytes --]

On Fri, 07 Dec 2007 22:04:48 +0300, Al Boldi said:

> Because WORKFLOW C is transparent, it won't affect other workflows.  So you 
> could still use your normal WORKFLOW B in addition to WORKFLOW C, gaining an 
> additional level of version control detail at no extra cost other than the 
> git-engine scratch repository overhead.
> 
> BTW, is git efficient enough to handle WORKFLOW C?

Imagine the number of commits a 'make clean; make' will do in a kernel tree, as
it commits all those .o files... :)


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply

* Re: Git and GCC
From: Nicolas Pitre @ 2007-12-07 19:36 UTC (permalink / raw)
  To: Jon Smirl
  Cc: Linus Torvalds, Harvey Harrison, Daniel Berlin, David Miller,
	ismail, gcc, git
In-Reply-To: <9e4733910712062308t22258c6anb685b18a663e0a31@mail.gmail.com>

On Fri, 7 Dec 2007, Jon Smirl wrote:

> On 12/7/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >
> >
> > On Thu, 6 Dec 2007, Jon Smirl wrote:
> > > >
> > > >         time git blame -C gcc/regclass.c > /dev/null
> > >
> > > jonsmirl@terra:/video/gcc$ time git blame -C gcc/regclass.c > /dev/null
> > >
> > > real    1m21.967s
> > > user    1m21.329s
> >
> > Well, I was also hoping for a "compared to not-so-aggressive packing"
> > number on the same machine.. IOW, what I was wondering is whether there is
> > a visible performance downside to the deeper delta chains in the 300MB
> > pack vs the (less aggressive) 500MB pack.
> 
> Same machine with a default pack
> 
> jonsmirl@terra:/video/gcc/.git/objects/pack$ ls -l
> total 2145716
> -r--r--r-- 1 jonsmirl jonsmirl   23667932 2007-12-07 02:03
> pack-bd163555ea9240a7fdd07d2708a293872665f48b.idx
> -r--r--r-- 1 jonsmirl jonsmirl 2171385413 2007-12-07 02:03
> pack-bd163555ea9240a7fdd07d2708a293872665f48b.pack
> jonsmirl@terra:/video/gcc/.git/objects/pack$
> 
> Delta lengths have virtually no impact. 

I can confirm this.

I just did a repack keeping the default depth of 50 but with window=100 
instead of the default of 10, and the pack shrunk from 2171385413 bytes 
down to 410607140 bytes.

So our default window size is definitely not adequate for the gcc repo.

OTOH, I recall tytso mentioning something about not having much return 
on  a bigger window size in his tests when he proposed to increase the 
default delta depth to 50.  So there is definitely some kind of threshold 
at which point the increased window size stops being advantageous wrt 
the number of cycles involved, and we should find a way to correlate it 
to the data set to have a better default window size than the current 
fixed default.


Nicolas

^ permalink raw reply

* Re: What's cooking in git.git (topics)
From: Junio C Hamano @ 2007-12-07 19:29 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <m3tzmuu57k.fsf@roke.D-201>

Jakub Narebski <jnareb@gmail.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
> ...
>> [On hold]
>> 
>> * nd/dashless (Wed Nov 28 23:21:57 2007 +0700) 1 commit
>>  - Move all dashed-form commands to libexecdir
>> 
>> I think this is a sane thing to do in the longer term.  Will be in
>> 'next' after v1.5.4.  I think "leave porcelain on PATH" might be also a
>> good thing as a transition measure.
>
> We would have to change the paragraph in INSTALL about git wrapper
> (which would be no longer optional, or at least no longer optional in
> the sense that you can just delete/not install this file), and its
> conflict with (old) GNU Interactive Tools (the other 'git').

Thanks for noticing.  Please send in a proposed patch to do so; then
we can park it near the tip of this topic, and nobody will forget.

>> [Stalled]
>> 
>> * ns/checkout-push-pop (Wed Dec 5 07:04:06 2007 +0900) 1 commit
>>  - git-checkout --push/--pop
>> 
>> A reasonably cleanly written cute hack, and I do not see this breaking
>> the normal codepath, so I do not mind merging this as long as people
>> find it useful.
>
> That would be nice to have, although as somebody[*1*] said, you usualy
> know that you should have pushed branch into stack when you want to
> 'pop'. So it would be nice to have (if possible and easy to implement)
> also "git checkout --previous" or "git checkout -".
> ...

Perhaps.  There are a few issues, though.

 * When you were on 'master' and say "co -", you would want to come back
   to the 'master' branch, whose tip may have advanced since you
   switched away from (e.g. "git push . experiment:master"), and that is
   a desired behaviour.  When you switch away from a detached HEAD, what
   would we record?  The fact the head was detached and its commit, so
   next "co -" would come back to that exact commit in a detached state?
   Or "co -" is meant to say "I was distracted and was away but now
   let's go back to my normal working state" and should refrain from
   touching the previous branch information?  I tend to think it would
   be the latter.

 * There are a few commands that are not "git checkout" but still
   switches branches ("rebase that branch on this one" form of rebase
   and "bisect").  Personally, I think bisect should stop using the
   branch 'bisect' but instead work on detached HEAD in the longer run,
   but what would we do about "rebase"?

> [*1*] I'm sorry for no attribution

I think this was Matthieu Moy, <vpqir3de8t6.fsf@bauges.imag.fr>,
http://article.gmane.org/gmane.comp.version-control.git/67133

>> * jc/pathspec (Thu Sep 13 13:38:19 2007 -0700) 3 commits
>>  . pathspec_can_match(): move it from builtin-ls-tree.c to tree.c
>
> What is the status of this thingy, by the way?

As the topic group header says, it is [Stalled].

^ permalink raw reply

* Re: git-bisect feature suggestion: "git-bisect diff"
From: Junio C Hamano @ 2007-12-07 19:28 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: git
In-Reply-To: <20071207112159.GA11035@elte.hu>

Ingo Molnar <mingo@elte.hu> writes:

> ... One small detail though: i frequently ssh to testboxes that have 
> DISPLAY set but i want text output. So git-bisect view --text should be 
> a special-case perhaps?

Yeah, but at that point, wouldn't "git bisect view log" be shorter to
type?

^ permalink raw reply

* Re: [PATCH] Let git-help prefer man-pages installed with this version of git
From: Junio C Hamano @ 2007-12-07 19:29 UTC (permalink / raw)
  To: Sergei Organov; +Cc: Johannes Schindelin, git, Christian Couder
In-Reply-To: <871w9y7mei.fsf@osv.gnss.ru>

Sergei Organov <osv@javad.com> writes:

> First, I don't think you need to clarify like this. It is just
> implementation detail of git-help that it uses 'man', and thus
> implicitly relies on MANPATH. The essential thing has been already
> stated above: git-help should show correct documentation.

Ok, this is a good argument for the patch.  With Christian's
enhancements, we will handle -i(nfo) and -w(eb) and we will tell the
"info" and "html" browsers where the documentation we installed for the
running instance of git is, so we should do so consistently for
"manpage" browser (aka "man").  You are right.

^ permalink raw reply

* Re: Some git performance measurements..
From: Mike Ralphson @ 2007-12-07 19:15 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Steffen Prohaska, Junio C Hamano, Nicolas Pitre, Linus Torvalds,
	Jakub Narebski, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0712071816100.27959@racer.site>

On Dec 7, 2007 6:37 PM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Fri, 7 Dec 2007, Mike Ralphson wrote:
>
> > On Dec 7, 2007 1:49 PM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > > On Fri, 7 Dec 2007, Mike Ralphson wrote:
> > >
> > > > I benchmarked 3 alternative qsorts, qsortG [2] was the fastest on my
> > > > system but has funky licensing, the NetBSD qsort was middle-range
> > > > and the glibc one the slowest of the three (but that could be due to
> > > > it being tuned for a "Sun 4/260"). All of them show over 100x speed
> > > > improvements on a git-status of my main repo (104s -> ~0.7s)
> > >
>
> Okay, sorry, I did not bother reading further when I read "You may use it
> in anything you like;".
>
> But if the author did not respond, it might be a better idea to just
> reimplement it.
>

I've just tried the mergesort implementation as used in msysgit and
that performs faster for me. It's simpler, and compatibly licensed. It
looks good.

Mike

^ permalink raw reply

* Re: [PATCH] quote_path: convert empty path to "./"
From: Thomas Harning @ 2007-12-07 19:05 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0712071853500.27959@racer.site>

Johannes Schindelin wrote:
> Hi,
>
> On Fri, 7 Dec 2007, Jeff King wrote:
>   
>> ...
>>     
>
> Sounds reasonable.
>
> Ciao,
> Dscho  
I concur.  There is one case that this seems to dodge.  What about the 
case where you are in:

/test/test_2  where /test  is not tracked...

This should probably show "./../"   not just "./"   , right?

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox