Git development
 help / color / mirror / Atom feed
* Re: Git and GCC
From: Linus Torvalds @ 2007-12-07  3:01 UTC (permalink / raw)
  To: Harvey Harrison; +Cc: Daniel Berlin, David Miller, ismail, gcc, git
In-Reply-To: <1196995353.22471.20.camel@brick>



On Thu, 6 Dec 2007, Harvey Harrison wrote:
> 
> I've updated the public mirror repo with the very-packed version.

Side note: it might be interesting to compare timings for 
history-intensive stuff with and without this kind of very-packed 
situation.

The very density of a smaller pack-file might be enough to overcome the 
downsides (more CPU time to apply longer delta-chains), but regardless, 
real numbers talks, bullshit walks. So wouldn't it be nice to have real 
numbers?

One easy way to get real numbers for history would be to just time some 
reasonably costly operation that uses lots of history. Ie just do a 

	time git blame -C gcc/regclass.c > /dev/null

and see if the deeper delta chains are very expensive.

(Yeah, the above is pretty much designed to be the worst possible case for 
this kind of aggressive history packing, but I don't know if that choice 
of file to try to annotate is a good choice or not. I suspect that "git 
blame -C" with a CVS import is just horrid, because CVS commits tend to be 
pretty big and nasty and not as localized as we've tried to make things in 
the kernel, so doing the code copy detection is probably horrendously 
expensive)

			Linus

^ permalink raw reply

* Re: Better value for chunk_size when threaded
From: Nicolas Pitre @ 2007-12-07  3:25 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List
In-Reply-To: <9e4733910712061737o50a9a5f1ldccdf943bb19319f@mail.gmail.com>

On Thu, 6 Dec 2007, Jon Smirl wrote:

> On 12/6/07, Nicolas Pitre <nico@cam.org> wrote:
> > On Thu, 6 Dec 2007, Jon Smirl wrote:
> >
> > > I tried some various ideas out for chunk_size and the best strategy I
> > > found was to simply set it to a constant. How does 20,000 work on
> > > other CPUs?
> >
> > That depends on the object size.  If you have a repo with big objects
> > but only 1000 of them for example, then the constant doesn't work.
> 
> How about defaulting it to 20,000 and allowing an override? It's not
> fatal if we guess wrong, we just want to most common cases to work out
> of the box. 20,000 is definitely better than the current window *
> 1000.

Sure.

... But I think this can be made much better than that with no guessing 
at all.

Say you have 4 threads.  then let's divide the whole object list into 4 
big segments and feed those to each thread.

One thread will always finish before the others.  The idea is to find 
the active thread with the largest amount of remaining objects to 
process at that point, and steal half of them and give that to the 
thread that just finished.  Repeat for each thread that completes its 
segment until everything is done.


Nicolas

^ permalink raw reply

* Re: Git and GCC
From: David Miller @ 2007-12-07  3:31 UTC (permalink / raw)
  To: peff; +Cc: nico, jonsmirl, dberlin, harvey.harrison, ismail, gcc, git
In-Reply-To: <20071206173946.GA10845@sigill.intra.peff.net>

From: Jeff King <peff@peff.net>
Date: Thu, 6 Dec 2007 12:39:47 -0500

> I tried the threaded repack with pack.threads = 3 on a dual-processor
> machine, and got:
> 
>   time git repack -a -d -f --window=250 --depth=250
> 
>   real    309m59.849s
>   user    377m43.948s
>   sys     8m23.319s
> 
>   -r--r--r-- 1 peff peff  28570088 2007-12-06 10:11 pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.idx
>   -r--r--r-- 1 peff peff 339922573 2007-12-06 10:11 pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.pack
> 
> So it is about 5% bigger. What is really disappointing is that we saved
> only about 20% of the time. I didn't sit around watching the stages, but
> my guess is that we spent a long time in the single threaded "writing
> objects" stage with a thrashing delta cache.

If someone can give me a good way to run this test case I can
have my 64-cpu Niagara-2 box crunch on this and see how fast
it goes and how much larger the resulting pack file is.

^ permalink raw reply

* multi-threaded git-index-pack
From: Jon Smirl @ 2007-12-07  3:32 UTC (permalink / raw)
  To: Git Mailing List

I'm cloning the gcc repo. My clone has been sitting in git-index-pack
for about 10 minutes and it is only using one core. Is this something
could be multi-threaded?

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: multi-threaded git-index-pack
From: Nicolas Pitre @ 2007-12-07  3:45 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List
In-Reply-To: <9e4733910712061932p712b9f00k49677a0db4afee8d@mail.gmail.com>

On Thu, 6 Dec 2007, Jon Smirl wrote:

> I'm cloning the gcc repo. My clone has been sitting in git-index-pack
> for about 10 minutes and it is only using one core. Is this something
> could be multi-threaded?

I've done the same and had the same thought, although it seemed shorter 
than 10 minutes to me.

Yes, most of it can be made multi-thread as well.


Nicolas

^ permalink raw reply

* Re: Git and GCC
From: Jon Smirl @ 2007-12-07  4:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Harvey Harrison, Daniel Berlin, David Miller, ismail, gcc, git
In-Reply-To: <alpine.LFD.0.9999.0712061857060.13796@woody.linux-foundation.org>

On 12/6/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Thu, 6 Dec 2007, Harvey Harrison wrote:
> >
> > I've updated the public mirror repo with the very-packed version.
>
> Side note: it might be interesting to compare timings for
> history-intensive stuff with and without this kind of very-packed
> situation.
>
> The very density of a smaller pack-file might be enough to overcome the
> downsides (more CPU time to apply longer delta-chains), but regardless,
> real numbers talks, bullshit walks. So wouldn't it be nice to have real
> numbers?
>
> One easy way to get real numbers for history would be to just time some
> reasonably costly operation that uses lots of history. Ie just do a
>
>         time git blame -C gcc/regclass.c > /dev/null
>
> and see if the deeper delta chains are very expensive.

jonsmirl@terra:/video/gcc$ time git blame -C gcc/regclass.c > /dev/null

real    1m21.967s
user    1m21.329s
sys     0m0.640s

The Mozilla repo is at least 50% larger than the gcc one. It took me
23 minutes to repack the gcc one on my $800 Dell. The trick to this is
lots of RAM and 64b. There is little disk IO during the compression
phase, everything is cached.

I have a 4.8GB git process with 4GB of physical memory. Everything
started slowing down a lot when the process got that big. Does git
really need 4.8GB to repack? I could only keep 3.4GB resident. Luckily
this happen at 95% completion. With 8GB of memory you should be able
to do this repack in under 20 minutes.

jonsmirl@terra:/video/gcc$ time git repack -a -d -f --depth=250 --window=250
real    22m54.380s
user    69m18.948s
sys     0m23.773s


> (Yeah, the above is pretty much designed to be the worst possible case for
> this kind of aggressive history packing, but I don't know if that choice
> of file to try to annotate is a good choice or not. I suspect that "git
> blame -C" with a CVS import is just horrid, because CVS commits tend to be
> pretty big and nasty and not as localized as we've tried to make things in
> the kernel, so doing the code copy detection is probably horrendously
> expensive)
>
>                         Linus
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: Git and GCC
From: Nicolas Pitre @ 2007-12-07  4:21 UTC (permalink / raw)
  To: Jon Smirl
  Cc: Linus Torvalds, Harvey Harrison, Daniel Berlin, David Miller,
	ismail, gcc, git
In-Reply-To: <9e4733910712062006l651571f3w7f76ce64c6650dff@mail.gmail.com>

On Thu, 6 Dec 2007, Jon Smirl wrote:

> I have a 4.8GB git process with 4GB of physical memory. Everything
> started slowing down a lot when the process got that big. Does git
> really need 4.8GB to repack? I could only keep 3.4GB resident. Luckily
> this happen at 95% completion. With 8GB of memory you should be able
> to do this repack in under 20 minutes.

Probably you have too many cached delta results.  By default, every 
delta smaller than 1000 bytes is kept in memory until the write phase.  
Try using pack.deltacachesize = 256M or lower, or try disabling this 
caching entirely with pack.deltacachelimit = 0.


Nicolas

^ permalink raw reply

* Re: git guidance
From: Al Boldi @ 2007-12-07  4:37 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Andreas Ericsson, Phillip Susi, Linus Torvalds, Jing Xue,
	linux-kernel, git
In-Reply-To: <Pine.LNX.4.64.0712062119090.21625@wbgn129.biozentrum.uni-wuerzburg.de>

Johannes Schindelin wrote:
> Hi,

Hi

> On Fri, 7 Dec 2007, Al Boldi wrote:
> > You need to re-read the thread.
>
> I don't know why you write that, and then say thanks.  Clearly, what you
> wrote originally, and what Andreas pointed out, were quite obvious
> indicators that git already does what you suggest.
>
> You _do_ work "transparently" (whatever you understand by that overused
> term) in the working directory, unimpeded by git.

If you go back in the thread, you may find a link to a gitfs client that 
somebody kindly posted.  This client pretty much defines the transparency 
I'm talking about.  The only problem is that it's read-only.

To make it really useful, it has to support versioning locally, disconnected 
from the server repository.  One way to implement this, could be by 
committing every update unconditionally to an on-the-fly created git 
repository private to the gitfs client.

With this transparently created private scratch repository it should then be 
possible for the same gitfs to re-expose the locally created commits, all 
without any direct user-intervention.

Later, this same scratch repository could then be managed by the normal 
git-management tools/commands to ultimately update the backend git 
repositories.

BTW:  Sorry for my previous posts that contained the wrong date; it seems 
that hibernation sometimes advances the date by a full 24h.  Has anybody 
noticed this as well?


Thanks!

--
Al

^ permalink raw reply

* Re: Git and GCC
From: Linus Torvalds @ 2007-12-07  5:21 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Harvey Harrison, Daniel Berlin, David Miller, ismail, gcc, git
In-Reply-To: <9e4733910712062006l651571f3w7f76ce64c6650dff@mail.gmail.com>



On Thu, 6 Dec 2007, Jon Smirl wrote:
> >
> >         time git blame -C gcc/regclass.c > /dev/null
> 
> jonsmirl@terra:/video/gcc$ time git blame -C gcc/regclass.c > /dev/null
> 
> real    1m21.967s
> user    1m21.329s

Well, I was also hoping for a "compared to not-so-aggressive packing" 
number on the same machine.. IOW, what I was wondering is whether there is 
a visible performance downside to the deeper delta chains in the 300MB 
pack vs the (less aggressive) 500MB pack.

		Linus

^ permalink raw reply

* [PATCH] git-help: fix looking up html install dir when browsing.
From: Christian Couder @ 2007-12-07  5:29 UTC (permalink / raw)
  To: Junio Hamano; +Cc: git

We used to search for the following directories:

	- $PREFIX/share/doc/git-doc
	- $PREFIX/share/doc/git-core-$GIT_VERSION

This was wrong because "htmldir" could be defined in the
Makefiles to something completely different.

So we now look for the $htlmdir directory first. But if
it fails we still fall back to the two above directories,
in case the script install and the html doc install have
been done with different $htmldir.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 Makefile           |    5 ++++-
 git-browse-help.sh |    4 +++-
 2 files changed, 7 insertions(+), 2 deletions(-)

	Junio wrote:
	> People can set htmldir to somewhere other than
	> $(prefix)/share/doc/git-doc while building and
	> installing, but you are not telling the munged
	> script where it is.

	This should fix it.
	Thanks.

diff --git a/Makefile b/Makefile
index e9a119a..1e31f02 100644
--- a/Makefile
+++ b/Makefile
@@ -157,6 +157,7 @@ bindir = $(prefix)/bin
 gitexecdir = $(bindir)
 sharedir = $(prefix)/share
 template_dir = $(sharedir)/git-core/templates
+htmldir=$(sharedir)/doc/git-doc
 ifeq ($(prefix),/usr)
 sysconfdir = /etc
 else
@@ -183,7 +184,7 @@ GITWEB_FAVICON = git-favicon.png
 GITWEB_SITE_HEADER =
 GITWEB_SITE_FOOTER =
 
-export prefix bindir gitexecdir sharedir template_dir sysconfdir
+export prefix bindir gitexecdir sharedir template_dir htmldir sysconfdir
 
 CC = gcc
 AR = ar
@@ -747,6 +748,7 @@ DESTDIR_SQ = $(subst ','\'',$(DESTDIR))
 bindir_SQ = $(subst ','\'',$(bindir))
 gitexecdir_SQ = $(subst ','\'',$(gitexecdir))
 template_dir_SQ = $(subst ','\'',$(template_dir))
+htmldir_SQ = $(subst ','\'',$(htmldir))
 prefix_SQ = $(subst ','\'',$(prefix))
 
 SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
@@ -811,6 +813,7 @@ $(patsubst %.sh,%,$(SCRIPT_SH)) : % : %.sh
 	    -e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g' \
 	    -e 's/@@NO_CURL@@/$(NO_CURL)/g' \
 	    -e 's|@@PREFIX@@|$(prefix_SQ)|g' \
+	    -e 's|@@HTMLDIR@@|$(htmldir_SQ)|g' \
 	    $@.sh >$@+ && \
 	chmod +x $@+ && \
 	mv $@+ $@
diff --git a/git-browse-help.sh b/git-browse-help.sh
index 11f8bfa..12d313a 100755
--- a/git-browse-help.sh
+++ b/git-browse-help.sh
@@ -21,6 +21,8 @@ SUBDIRECTORY_OK=Yes
 OPTIONS_SPEC=
 . git-sh-setup
 
+# Install data.
+special_html_dir="@@HTMLDIR@@"
 PREFIX="@@PREFIX@@"
 GIT_VERSION="@@GIT_VERSION@@"
 
@@ -30,7 +32,7 @@ rpm_dir="$PREFIX/share/doc/git-core-$GIT_VERSION"
 
 # Look for the directory that really contains html documentation.
 html_dir=''
-for dir in "$install_html_dir" "$rpm_dir"
+for dir in "$special_html_dir" "$install_html_dir" "$rpm_dir"
 do
 	test -d "$dir" && { html_dir="$dir" ; break ; }
 done
-- 
1.5.3.6.1993.g154f-dirty

^ permalink raw reply related

* Re: [PATCH 2/3] git-help: add -w|--web option to display html man page in a browser.
From: Christian Couder @ 2007-12-07  5:35 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Theodore Tso, Jakub Narebski, Alex Riesen, Andreas Ericsson,
	Matthieu Moy, Eric Wong
In-Reply-To: <7v3aufeowe.fsf@gitster.siamese.dyndns.org>

Le jeudi 6 décembre 2007, Junio C Hamano a écrit :
> Christian Couder <chriscool@tuxfamily.org> writes:
> > diff --git a/Documentation/Makefile b/Documentation/Makefile
> > index d886641..3e01718 100644
> > --- a/Documentation/Makefile
> > +++ b/Documentation/Makefile
> > @@ -29,6 +29,7 @@ DOC_MAN7=$(patsubst %.txt,%.7,$(MAN7_TXT))
> >
> >  prefix?=$(HOME)
> >  bindir?=$(prefix)/bin
> > +htmldir?=$(prefix)/share/doc/git-doc
> >  mandir?=$(prefix)/share/man
> >  man1dir=$(mandir)/man1
> >  man5dir=$(mandir)/man5
>
> Doing this and then ...
>
> > diff --git a/Makefile b/Makefile
> > index a5a40ce..9204bfe 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -807,6 +808,7 @@ $(patsubst %.sh,%,$(SCRIPT_SH)) : % : %.sh
> >  	    -e 's|@@PERL@@|$(PERL_PATH_SQ)|g' \
> >  	    -e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g' \
> >  	    -e 's/@@NO_CURL@@/$(NO_CURL)/g' \
> > +	    -e 's|@@PREFIX@@|$(prefix_SQ)|g' \
> >  	    $@.sh >$@+ && \
> >  	chmod +x $@+ && \
> >  	mv $@+ $@
> > ...
> > diff --git a/git-browse-help.sh b/git-browse-help.sh
> > new file mode 100755
> > index 0000000..11f8bfa
> > --- /dev/null
> > +++ b/git-browse-help.sh
> > @@ -0,0 +1,154 @@
> > +#!/bin/sh
> > ...
> > +USAGE='[--browser=browser|--tool=browser] [cmd to display] ...'
> > +SUBDIRECTORY_OK=Yes
> > +OPTIONS_SPEC=
> > +. git-sh-setup
> > +
> > +PREFIX="@@PREFIX@@"
> > +GIT_VERSION="@@GIT_VERSION@@"
> > +
> > +# Directories that may contain html documentation:
> > +install_html_dir="$PREFIX/share/doc/git-doc"
> > +rpm_dir="$PREFIX/share/doc/git-core-$GIT_VERSION"
>
> ... doing this is wrong. People can set htmldir to somewhere other than
> $(prefix)/share/doc/git-doc while building and installing, but you are
> not telling the munged script where it is.

Yeah, I sent a fix for this.

> > +init_browser_path() {
> > +	browser_path=`git config browser.$1.path`
> > +	test -z "$browser_path" && browser_path=$1
> > +}
>
> Please do not contaminate the config file with something the user can
> easily use a lot more standardized way (iow $PATH) to configure to his
> taste.
>
> I'd suggest dropping this bit.

I stole this part from "git-mergetool.sh":

init_merge_tool_path() {
        merge_tool_path=`git config mergetool.$1.path`
        if test -z "$merge_tool_path" ; then
                case "$1" in
                        emerge)
                                merge_tool_path=emacs
                                ;;
                        *)
                                merge_tool_path=$1
                                ;;
                esac
        fi
}

So we should either drop it in "git-mergetool.sh" too or keep it in both 
scripts.

Thanks,
Christian.

^ permalink raw reply

* Re: Git and GCC
From: NightStrike @ 2007-12-07  5:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Daniel Berlin, David Miller, ismail, gcc, git
In-Reply-To: <alpine.LFD.0.9999.0712061036200.13796@woody.linux-foundation.org>

On 12/6/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Thu, 6 Dec 2007, NightStrike wrote:
> >
> > No disrespect is meant by this reply.  I am just curious (and I am
> > probably misunderstanding something)..  Why remove all of the
> > documentation entirely?  Wouldn't it be better to just document it
> > more thoroughly?
>
> Well, part of it is that I don't think "--aggressive" as it is implemented
> right now is really almost *ever* the right answer. We could change the
> implementation, of course, but generally the right thing to do is to not
> use it (tweaking the "--window" and "--depth" manually for the repacking
> is likely the more natural thing to do).
>
> The other part of the answer is that, when you *do* want to do what that
> "--aggressive" tries to achieve, it's such a special case event that while
> it should probably be documented, I don't think it should necessarily be
> documented where it is now (as part of "git gc"), but as part of a much
> more technical manual for "deep and subtle tricks you can play".
>
> > I thought you did a fine job in this post in explaining its purpose,
> > when to use it, when not to, etc.  Removing the documention seems
> > counter-intuitive when you've already gone to the trouble of creating
> > good documentation here in this post.
>
> I'm so used to writing emails, and I *like* trying to explain what is
> going on, so I have no problems at all doing that kind of thing. However,
> trying to write a manual or man-page or other technical documentation is
> something rather different.
>
> IOW, I like explaining git within the _context_ of a discussion or a
> particular problem/issue. But documentation should work regardless of
> context (or at least set it up), and that's the part I am not so good at.
>
> In other words, if somebody (hint hint) thinks my explanation was good and
> readable, I'd love for them to try to turn it into real documentation by
> editing it up and creating enough context for it! But I'm nort personally
> very likely to do that. I'd just send Junio the patch to remove a
> misleading part of the documentation we have.

hehe.. I'd love to, actually.  I can work on it next week.

^ permalink raw reply

* Re: git-am: catch missing author date early.
From: Junio C Hamano @ 2007-12-07  6:06 UTC (permalink / raw)
  To: Len Brown; +Cc: git
In-Reply-To: <200712062134.47330.lenb@kernel.org>

Sorry, and thanks for the report.

Jens complained the same yesterday and the change was reverted.

^ permalink raw reply

* Re: [PATCH 2/3] git-help: add -w|--web option to display html man page in a browser.
From: Junio C Hamano @ 2007-12-07  6:04 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Theodore Tso, Jakub Narebski, Alex Riesen, Andreas Ericsson,
	Matthieu Moy, Eric Wong
In-Reply-To: <200712070635.18018.chriscool@tuxfamily.org>

Christian Couder <chriscool@tuxfamily.org> writes:

>> > +# Directories that may contain html documentation:
>> > +install_html_dir="$PREFIX/share/doc/git-doc"
>> > +rpm_dir="$PREFIX/share/doc/git-core-$GIT_VERSION"
>>
>> ... doing this is wrong. People can set htmldir to somewhere other than
>> $(prefix)/share/doc/git-doc while building and installing, but you are
>> not telling the munged script where it is.
>
> Yeah, I sent a fix for this.

Why do you even need to fallback?  I'd rather drop these two fallbacks
entirely.

Distros have their own html documentation layout policy, so I suspect
they will patch this part to their liking anyway, and this point will
mostly become moot.  For source distribution, I'd prefer to point at
the place we know we are installing in.

>> > +init_browser_path() {
>> > +	browser_path=`git config browser.$1.path`
>> > +	test -z "$browser_path" && browser_path=$1
>> > +}
>>
>> Please do not contaminate the config file with something the user can
>> easily use a lot more standardized way (iow $PATH) to configure to his
>> taste.
>>
>> I'd suggest dropping this bit.
>
> I stole this part from "git-mergetool.sh":
>
> init_merge_tool_path() {
>         merge_tool_path=`git config mergetool.$1.path`
>         if test -z "$merge_tool_path" ; then
>                 case "$1" in
>                         emerge)
>                                 merge_tool_path=emacs
>                                 ;;
> ...
> }
>
> So we should either drop it in "git-mergetool.sh" too or keep it in both 
> scripts.

I think this is an irrelevant defense.  If others are doing bad, that is
not a justification to make things worse.

In the case of mergetool, it has case "$merge_tool" that can be spelled
totally differently from path (e.g. emerge and emacs), so that function
itself is semi justified.  For browser I do not think there isn't such
justification.

^ permalink raw reply

* Re: git-clean and empty pathspec
From: Shawn Bohrer @ 2007-12-07  6:14 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Git Mailing List
In-Reply-To: <fcaeb9bf0712061021o5383f538h3a086a913ac1b05d@mail.gmail.com>

On Fri, Dec 07, 2007 at 01:21:10AM +0700, Nguyen Thai Ngoc Duy wrote:
> "git clean -n" would not remove directories while "git clean -n -- ''"
> (two single quotes) would. Is there anything wrong with it?

It appears that match_pathspec views this as a recursive match to
everything, so git clean thinks that you provided the pathspec for all
files and directories and thus will remove them.

Without the '' there is no provided pathspec so git clean will not
remove the directories without -d.

Note this is different behavior from the old git-clean.sh.

^ permalink raw reply

* Re: Git and GCC
From: Jeff King @ 2007-12-07  6:38 UTC (permalink / raw)
  To: David Miller; +Cc: nico, jonsmirl, dberlin, harvey.harrison, ismail, gcc, git
In-Reply-To: <20071206.193121.40404287.davem@davemloft.net>

On Thu, Dec 06, 2007 at 07:31:21PM -0800, David Miller wrote:

> > So it is about 5% bigger. What is really disappointing is that we saved
> > only about 20% of the time. I didn't sit around watching the stages, but
> > my guess is that we spent a long time in the single threaded "writing
> > objects" stage with a thrashing delta cache.
> 
> If someone can give me a good way to run this test case I can
> have my 64-cpu Niagara-2 box crunch on this and see how fast
> it goes and how much larger the resulting pack file is.

That would be fun to see. The procedure I am using is this:

# compile recent git master with threaded delta
cd git
echo THREADED_DELTA_SEARCH = 1 >>config.mak
make install

# get the gcc pack
mkdir gcc && cd gcc
git --bare init
git config remote.gcc.url git://git.infradead.org/gcc.git
git config remote.gcc.fetch \
  '+refs/remotes/gcc.gnu.org/*:refs/remotes/gcc.gnu.org/*'
git remote update

# make a copy, so we can run further tests from a known point
cd ..
cp -a gcc test

# and test multithreaded large depth/window repacking
cd test
git config pack.threads 4
time git repack -a -d -f --window=250 --depth=250

-Peff

^ permalink raw reply

* Re: Git and GCC
From: Jeff King @ 2007-12-07  6:50 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Jon Smirl, Daniel Berlin, Harvey Harrison, David Miller, ismail,
	gcc, git
In-Reply-To: <alpine.LFD.0.99999.0712061246120.555@xanadu.home>

On Thu, Dec 06, 2007 at 01:02:58PM -0500, Nicolas Pitre wrote:

> > What is really disappointing is that we saved
> > only about 20% of the time. I didn't sit around watching the stages, but
> > my guess is that we spent a long time in the single threaded "writing
> > objects" stage with a thrashing delta cache.
> 
> Maybe you should run the non threaded repack on the same machine to have 
> a good comparison.

Sorry, I should have been more clear. By "saved" I meant "we needed N
minutes of CPU time, but took only M minutes of real time to use it."
IOW, if we assume that the threading had zero overhead and that we were
completely CPU bound, then the task would have taken N minutes of real
time. And obviously those assumptions aren't true, but I was attempting
to say "it would have been at most N minutes of real time to do it
single-threaded."

> And if you have only 2 CPUs, you will have better performances with
> pack.threads = 2, otherwise there'll be wasteful task switching going
> on.

Yes, but balanced by one thread running out of data way earlier than the
other, and completing the task with only one CPU. I am doing a 4-thread
test on a quad-CPU right now, and I will also try it with threads=1 and
threads=6 for comparison.

> And of course, if the delta cache is being trashed, that might be due to 
> the way the existing pack was previously packed.  Hence the current pack 
> might impact object _access_ when repacking them.  So for a really 
> really fair performance comparison, you'd have to preserve the original 
> pack and swap it back before each repack attempt.

I am working each time from the pack generated by fetching from
git://git.infradead.org/gcc.git.

-Peff

^ permalink raw reply

* Re: [PATCH] Change from using email.com to example.com as example domain, as per RFC 2606.
From: Mike Hommey @ 2007-12-07  7:01 UTC (permalink / raw)
  To: David Symonds; +Cc: Junio Hamano, git
In-Reply-To: <11969842052283-git-send-email-dsymonds@gmail.com>

On Fri, Dec 07, 2007 at 10:36:45AM +1100, David Symonds wrote:
> -	  "Signed-off-by: Your Name <your@email.com>" line to the
> +	  "Signed-off-by: Your Name <your@example.com>" line to the

you@example.com would be better IMHO.

Mike

^ permalink raw reply

* Re: [PATCH 2/3] git-help: add -w|--web option to display html man page in a browser.
From: Junio C Hamano @ 2007-12-07  7:05 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Theodore Tso, Jakub Narebski, Alex Riesen, Andreas Ericsson,
	Matthieu Moy, Eric Wong
In-Reply-To: <7v1w9z9h2k.fsf@gitster.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> Christian Couder <chriscool@tuxfamily.org> writes:
> ...
>>> > +init_browser_path() {
>>> > +	browser_path=`git config browser.$1.path`
>>> > +	test -z "$browser_path" && browser_path=$1
>>> > +}
>>>
>>> Please do not contaminate the config file with something the user can
>>> easily use a lot more standardized way (iow $PATH) to configure to his
>>> taste.
>>>
>>> I'd suggest dropping this bit.

Well, I changed my mind.  It is a bit funny to have both firefox and
iceweasel as "valid-tool", but if we consider $browser to define the
external interface and $browser_path to define the implementation, it
sort of makes sense to have that configuration.  browser_path could be
iceweasel for browser firefox.

I'll squash the patch to update the one from the last round (as the last
two patches are not yet accepted in 'next' yet), remove the html
documentation path fallback, but will leave this part in.

browser.*.path and web.browser configuration need to be documented, if
not already, though.

^ permalink raw reply

* Re: Git and GCC
From: Jon Smirl @ 2007-12-07  7:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Harvey Harrison, Daniel Berlin, David Miller, ismail, gcc, git
In-Reply-To: <alpine.LFD.0.9999.0712062120100.13796@woody.linux-foundation.org>

On 12/7/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Thu, 6 Dec 2007, Jon Smirl wrote:
> > >
> > >         time git blame -C gcc/regclass.c > /dev/null
> >
> > jonsmirl@terra:/video/gcc$ time git blame -C gcc/regclass.c > /dev/null
> >
> > real    1m21.967s
> > user    1m21.329s
>
> Well, I was also hoping for a "compared to not-so-aggressive packing"
> number on the same machine.. IOW, what I was wondering is whether there is
> a visible performance downside to the deeper delta chains in the 300MB
> pack vs the (less aggressive) 500MB pack.

Same machine with a default pack

jonsmirl@terra:/video/gcc/.git/objects/pack$ ls -l
total 2145716
-r--r--r-- 1 jonsmirl jonsmirl   23667932 2007-12-07 02:03
pack-bd163555ea9240a7fdd07d2708a293872665f48b.idx
-r--r--r-- 1 jonsmirl jonsmirl 2171385413 2007-12-07 02:03
pack-bd163555ea9240a7fdd07d2708a293872665f48b.pack
jonsmirl@terra:/video/gcc/.git/objects/pack$

Delta lengths have virtually no impact. The bigger pack file causes
more IO which offsets the increased delta processing time.

One of my rules is smaller is almost always better. Smaller eliminates
IO and helps with the CPU cache. It's like the kernel being optimized
for size instead of speed ending up being  faster.

time git blame -C gcc/regclass.c > /dev/null
real    1m19.289s
user    1m17.853s
sys     0m0.952s



>
>                 Linus
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: Git and GCC
From: Jon Smirl @ 2007-12-07  7:10 UTC (permalink / raw)
  To: Jeff King; +Cc: David Miller, nico, dberlin, harvey.harrison, ismail, gcc, git
In-Reply-To: <20071207063848.GA13101@coredump.intra.peff.net>

On 12/7/07, Jeff King <peff@peff.net> wrote:
> On Thu, Dec 06, 2007 at 07:31:21PM -0800, David Miller wrote:
>
> > > So it is about 5% bigger. What is really disappointing is that we saved
> > > only about 20% of the time. I didn't sit around watching the stages, but
> > > my guess is that we spent a long time in the single threaded "writing
> > > objects" stage with a thrashing delta cache.
> >
> > If someone can give me a good way to run this test case I can
> > have my 64-cpu Niagara-2 box crunch on this and see how fast
> > it goes and how much larger the resulting pack file is.
>
> That would be fun to see. The procedure I am using is this:
>
> # compile recent git master with threaded delta
> cd git
> echo THREADED_DELTA_SEARCH = 1 >>config.mak
> make install
>
> # get the gcc pack
> mkdir gcc && cd gcc
> git --bare init
> git config remote.gcc.url git://git.infradead.org/gcc.git
> git config remote.gcc.fetch \
>   '+refs/remotes/gcc.gnu.org/*:refs/remotes/gcc.gnu.org/*'
> git remote update
>
> # make a copy, so we can run further tests from a known point
> cd ..
> cp -a gcc test
>
> # and test multithreaded large depth/window repacking
> cd test
> git config pack.threads 4

64 threads with 64 CPUs, if they are multicore you want even more.
you need to adjust chunk_size as mentioned in the other mail.


> time git repack -a -d -f --window=250 --depth=250
>
> -Peff
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: [PATCH/RFC (take 3)] autoconf: Add test for OLD_ICONV (squelching compiler warning)
From: Junio C Hamano @ 2007-12-07  7:26 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: git, Linus Torvalds, Blake Ramsdell, Wincent Colaiuta,
	Pascal Obry, Ramsay Jones, Arjen Laarhoven, Brian Gernhardt
In-Reply-To: <1196990840-1168-1-git-send-email-jnareb@gmail.com>

Jakub Narebski <jnareb@gmail.com> writes:

> On Fri, 7 Dec 2007, Blake Ramsdell wrote:
>> On Dec 6, 2007 4:41 PM, Blake Ramsdell <blaker@gmail.com> wrote:
>>> On Dec 6, 2007 4:30 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>>> Umm. Why not just make the test be whether the following compiles cleanly?
>>>>
>>>>         #include <iconv.h>
>>>>
>>>>         extern size_t iconv(iconv_t cd,
>>>>           char **inbuf, size_t *inbytesleft,
>>>>           char **outbuf, size_t *outbytesleft);
>>>>
>>>> because if the compiler has seen a "const char **inbuf", then it  should
>>>> error out with a "conflicting types for 'iconv'" style message..
>>>
>>> Yeah, this is what I did:
>> 
>> My apologies. Your suggestion is completely different, and should work
>> without -Werror. Let me try that.
>
> Is something like the patch below what you wanted to try?

This looks sensible.  Will apply.

^ permalink raw reply

* Re: Git and GCC
From: Jeff King @ 2007-12-07  7:27 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Jon Smirl, Daniel Berlin, Harvey Harrison, David Miller, ismail,
	gcc, git
In-Reply-To: <20071207065047.GB13101@coredump.intra.peff.net>

On Fri, Dec 07, 2007 at 01:50:47AM -0500, Jeff King wrote:

> Yes, but balanced by one thread running out of data way earlier than the
> other, and completing the task with only one CPU. I am doing a 4-thread
> test on a quad-CPU right now, and I will also try it with threads=1 and
> threads=6 for comparison.

Hmm. As this has been running, I read the rest of the thread, and it
looks like Jon Smirl has already posted the interesting numbers. So
nevermind, unless there is something particular you would like to see.

-Peff

^ permalink raw reply

* Re: Git and GCC
From: Jeff King @ 2007-12-07  7:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nicolas Pitre, Jon Smirl, Daniel Berlin, Harvey Harrison,
	David Miller, ismail, gcc, git
In-Reply-To: <alpine.LFD.0.9999.0712061030560.13796@woody.linux-foundation.org>

On Thu, Dec 06, 2007 at 10:35:22AM -0800, Linus Torvalds wrote:

> > What is really disappointing is that we saved only about 20% of the 
> > time. I didn't sit around watching the stages, but my guess is that we 
> > spent a long time in the single threaded "writing objects" stage with a 
> > thrashing delta cache.
> 
> I don't think you spent all that much time writing the objects. That part 
> isn't very intensive, it's mostly about the IO.

It can get nasty with super-long deltas thrashing the cache, I think.
But in this case, I think it ended up being just a poor division of
labor caused by the chunk_size parameter using the quite large window
size (see elsewhere in the thread for discussion).

> I suspect you may simply be dominated by memory-throughput issues. The 
> delta matching doesn't cache all that well, and using two or more cores 
> isn't going to help all that much if they are largely waiting for memory 
> (and quite possibly also perhaps fighting each other for a shared cache? 
> Is this a Core 2 with the shared L2?)

I think the chunk_size more or less explains it. I have had reasonable
success keeping both CPUs busy on similar tasks in the past (but with
smaller window sizes).

For reference, it was a Core 2 Duo; do they all share L2, or is there
something I can look for in /proc/cpuinfo?

-Peff

^ permalink raw reply

* Re: After-the-fact submodule detection or creation
From: Alex Riesen @ 2007-12-07  7:37 UTC (permalink / raw)
  To: Michael Poole; +Cc: git
In-Reply-To: <87ir3bp5sf.fsf@graviton.dyn.troilus.org>

Michael Poole, Fri, Dec 07, 2007 04:01:04 +0100:
> It seems like using the current submodule code would mean that this
> kind of import would need two passes over the foreign repository,
> rather than one if the branch could be created after the parent tree
> is initially imported.  I can live with that -- it is a rather unusual
> case -- but maybe there is a better way.)

Import the core module in a branch all by itself, and merge it in
every support branch?


    Supp1: o-o-o-----o-o-o-o-o-o-o
		    /
    Core:  o-o-o-o-o
		    \
    Supp2: o-o-------o-o-o-o

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox