Git development
 help / color / mirror / Atom feed
* git-svn branch naming question
From: Miklos Vajna @ 2007-12-08  1:04 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 852 bytes --]

hi,

i'm using git-svn for projects where i don't just want to commit to
trunk but to other branches, too.

for example:

git-svn clone -s svn+ssh://vmiklos@svn.gnome.org/svn/ooo-build ooo-build

then i have a local 'master' branch and all the other branches are local
branches.

so, when i want to work in the ooo-build-2-3 branch, i do a:

git checkout -b ooo-build-2-3 ooo-build-2-3

but when i do a git svn rebase, i get:

warning: refname 'ooo-build-2-3' is ambiguous.

what am i doing wrong?

in fact i suspect that in case i would use some other branch name, like
simply '2-3' then i could get rid of this warning, but that's the
problem with using the equivalent name of the remote branch when working
in a branch locally?

probably i miss some parameter to git-svn clone so that it would prefix
the refs with some 'origin'?

thanks,
- VMiklos

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: Git and GCC
From: Harvey Harrison @ 2007-12-08  0:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff King, Nicolas Pitre, Jon Smirl, Daniel Berlin, David Miller,
	ismail, gcc, git
In-Reply-To: <alpine.LFD.0.9999.0712061030560.13796@woody.linux-foundation.org>

Some interesting stats from the highly packed gcc repo.  The long chain
lengths very quickly tail off.  Over 60% of the objects have a chain
length of 20 or less.  If anyone wants the full list let me know.  I
also have included a few other interesting points, the git default
depth of 50, my initial guess of 100 and every 10% in the cumulative
distribution from 60-100%.

This shows the git default of 50 really isn't that bad, and after
about 100 it really starts to get sparse.  

Harvey

1:	103817	103817	10.20%	1017922
2:	67332	171149	16.81%
3:	57520	228669	22.46%
4:	52570	281239	27.63%
5:	43910	325149	31.94%
6:	37520	362669	35.63%
7:	35248	397917	39.09%
8:	29819	427736	42.02%
9:	27619	455355	44.73%
10:	22656	478011	46.96%
11:	21073	499084	49.03%
12:	18738	517822	50.87%
13:	16674	534496	52.51%
14:	14882	549378	53.97%
15:	14424	563802	55.39%
16:	12765	576567	56.64%
17:	11662	588229	57.79%
18:	11845	600074	58.95%
19:	11694	611768	60.10%
20:	9625	621393	61.05%
34:	5354	719356	70.67%
50:	3395	785342	77.15%
60:	2547	815072	80.07%
100:	1644	898284	88.25%
113:	1292	917046	90.09%
158:	959	967429	95.04%
200:	652	997653	98.01%
219:	491	1008132	99.04%
245:	179	1017717	99.98%
246:	111	1017828	99.99%
247:	61	1017889	100.00%
248:	27	1017916	100.00%
249:	6	1017922	100.00%

^ permalink raw reply

* Re: Something is broken in repack
From: Linus Torvalds @ 2007-12-08  0:37 UTC (permalink / raw)
  To: Jon Smirl, Nicolas Pitre; +Cc: Git Mailing List
In-Reply-To: <9e4733910712071505y6834f040k37261d65a2d445c4@mail.gmail.com>



On Fri, 7 Dec 2007, Jon Smirl wrote:
>
> Using this config:
> [pack]
>         threads = 4
>         deltacachesize = 256M

I think deltacachesize is broken.

The code in try_delta() that replaces a delta cache entry with another one 
seems very buggy wrt that whole "delta_cache_size" update. It does

	delta_cache_size -= trg_entry->delta_size;

to account for the old delta going away, but it does this *after* having 
already replaced trg_entry->delta_size with the new delta entry.

I suspect there are other issues going on too, but that's the one that I 
noticed from a quick look-through.

Nico? I think this one is yours..

		Linus

^ permalink raw reply

* Re: Git and GCC
From: Daniel Berlin @ 2007-12-07 23:33 UTC (permalink / raw)
  To: Giovanni Bajo
  Cc: Jakub Narebski, Linus Torvalds, David Miller, jonsmirl, peff,
	nico, harvey.harrison, ismail, gcc, git
In-Reply-To: <1197069298.6118.1.camel@ozzu>

On 12/7/07, Giovanni Bajo <rasky@develer.com> wrote:
> On Fri, 2007-12-07 at 14:14 -0800, Jakub Narebski wrote:
>
> > > >> Is SHA a significant portion of the compute during these repacks?
> > > >> I should run oprofile...
> > > > SHA1 is almost totally insignificant on x86. It hardly shows up. But
> > > > we have a good optimized version there.
> > > > zlib tends to be a lot more noticeable (especially the
> > > > *uncompression*: it may be faster than compression, but it's done _so_
> > > > much more that it totally dominates).
> > >
> > > Have you considered alternatives, like:
> > > http://www.oberhumer.com/opensource/ucl/
> >
> > <quote>
> >   As compared to LZO, the UCL algorithms achieve a better compression
> >   ratio but *decompression* is a little bit slower. See below for some
> >   rough timings.
> > </quote>
> >
> > It is uncompression speed that is more important, because it is used
> > much more often.
>
> I know, but the point is not what is the fastestest, but if it's fast
> enough to get off the profiles. I think UCL is fast enough since it's
> still times faster than zlib. Anyway, LZO is GPL too, so why not
> considering it too. They are good libraries.


At worst, you could also use fastlz (www.fastlz.org), which is faster
than all of these by a factor of 4 (and compression wise, is actually
sometimes better, sometimes worse, than LZO).

^ permalink raw reply

* Re: Git and GCC
From: Giovanni Bajo @ 2007-12-07 23:14 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Linus Torvalds, David Miller, jonsmirl, peff, nico, dberlin,
	harvey.harrison, ismail, gcc, git
In-Reply-To: <m3hciutaoq.fsf@roke.D-201>

On Fri, 2007-12-07 at 14:14 -0800, Jakub Narebski wrote:

> > >> Is SHA a significant portion of the compute during these repacks?
> > >> I should run oprofile...
> > > SHA1 is almost totally insignificant on x86. It hardly shows up. But
> > > we have a good optimized version there.
> > > zlib tends to be a lot more noticeable (especially the
> > > *uncompression*: it may be faster than compression, but it's done _so_
> > > much more that it totally dominates).
> > 
> > Have you considered alternatives, like:
> > http://www.oberhumer.com/opensource/ucl/
> 
> <quote>
>   As compared to LZO, the UCL algorithms achieve a better compression
>   ratio but *decompression* is a little bit slower. See below for some
>   rough timings.
> </quote>
> 
> It is uncompression speed that is more important, because it is used
> much more often.

I know, but the point is not what is the fastestest, but if it's fast
enough to get off the profiles. I think UCL is fast enough since it's
still times faster than zlib. Anyway, LZO is GPL too, so why not
considering it too. They are good libraries.
-- 
Giovanni Bajo

^ permalink raw reply

* Something is broken in repack
From: Jon Smirl @ 2007-12-07 23:05 UTC (permalink / raw)
  To: Git Mailing List

Using this config:
[pack]
        threads = 4
        deltacachesize = 256M
        deltacachelimit = 0

And the 330MB gcc pack for input
 git repack -a -d -f  --depth=250 --window=250

complete seconds RAM
10%  47 1GB
20%  29 1Gb
30%  24 1Gb
40%  18 1GB
50%  110 1.2GB
60%  85 1.4GB
70%  195 1.5GB
80%  186 2.5GB
90%  489 3.8GB
95%  800 4.8GB
I killed it because it started swapping

The mmaps are only about 400MB in this case.
At the end the git process had 4.4GB of physical RAM allocated.

Starting from a highly compressed pack greatly aggravates the problem.
Starting with a 2GB pack of the same data my process size only grew to
3GB with 2GB of mmaps.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: Git and GCC
From: Luke Lu @ 2007-12-07 23:04 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Giovanni Bajo, Linus Torvalds, David Miller, jonsmirl, peff, nico,
	dberlin, harvey.harrison, ismail, gcc, git
In-Reply-To: <m3hciutaoq.fsf@roke.D-201>

On Dec 7, 2007, at 2:14 PM, Jakub Narebski wrote:
> Giovanni Bajo <rasky@develer.com> writes:
>> On 12/7/2007 6:23 PM, Linus Torvalds wrote:
>>>> Is SHA a significant portion of the compute during these repacks?
>>>> I should run oprofile...
>>> SHA1 is almost totally insignificant on x86. It hardly shows up. But
>>> we have a good optimized version there.
>>> zlib tends to be a lot more noticeable (especially the
>>> *uncompression*: it may be faster than compression, but it's done  
>>> _so_
>>> much more that it totally dominates).
>>
>> Have you considered alternatives, like:
>> http://www.oberhumer.com/opensource/ucl/
>
> <quote>
>   As compared to LZO, the UCL algorithms achieve a better compression
>   ratio but *decompression* is a little bit slower. See below for some
>   rough timings.
> </quote>
>
> It is uncompression speed that is more important, because it is used
> much more often.

So why didn't we consider lzo then? It's much faster than zlib.

__Luke

  

^ permalink raw reply

* Re: [RFC/PATCH] Add a --nosort option to pack-objects
From: Nicolas Pitre @ 2007-12-07 22:20 UTC (permalink / raw)
  To: Mike Hommey; +Cc: Junio C Hamano, git
In-Reply-To: <20071207214627.GB13170@glandium.org>

On Fri, 7 Dec 2007, Mike Hommey wrote:

> As you can seen from my other message, I'm *actually* not sure this is
> really material for git as a VCS. I will add documentation unrelated to
> --nosort to pack-objects anyways.

Well, I have serious doubts about this patch in the first place.

I think it is simply unneeded.

If you want pack-objects not to change the sort order because you have 
some sorting of your own, externally implemented, then you simply have 
to run git-pack-objects feeding it the list of object SHA1s along with a 
tag of your own which will effectively impose the sorting you want, 
based on that tag.

Objects with the same tag will still be sorted amongst themselves which 
is still a good thing.

for example, you may have something like:

	git rev-list --all --objects |
	sed -e 's|foo/logs/.*|LOGS|' |
	git pack-objects ...

This will effectively cluster all foo/logs/* files together for delta 
compression regardless of their actual name.  Maybe that's what you 
really want?


Nicolas

^ permalink raw reply

* Re: Git and GCC
From: Jakub Narebski @ 2007-12-07 22:14 UTC (permalink / raw)
  To: Giovanni Bajo
  Cc: Linus Torvalds, David Miller, jonsmirl, peff, nico, dberlin,
	harvey.harrison, ismail, gcc, git
In-Reply-To: <4759AC8E.3070102@develer.com>

Giovanni Bajo <rasky@develer.com> writes:

> On 12/7/2007 6:23 PM, Linus Torvalds wrote:
> 
> >> Is SHA a significant portion of the compute during these repacks?
> >> I should run oprofile...
> > SHA1 is almost totally insignificant on x86. It hardly shows up. But
> > we have a good optimized version there.
> > zlib tends to be a lot more noticeable (especially the
> > *uncompression*: it may be faster than compression, but it's done _so_
> > much more that it totally dominates).
> 
> Have you considered alternatives, like:
> http://www.oberhumer.com/opensource/ucl/

<quote>
  As compared to LZO, the UCL algorithms achieve a better compression
  ratio but *decompression* is a little bit slower. See below for some
  rough timings.
</quote>

It is uncompression speed that is more important, because it is used
much more often.

-- 
Jakub Narebski
ShadeHawk on #git

^ permalink raw reply

* Re: git-bisect feature suggestion: "git-bisect diff"
From: Jeff King @ 2007-12-07 22:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ingo Molnar, git
In-Reply-To: <7vprxiyxfj.fsf@gitster.siamese.dyndns.org>

On Fri, Dec 07, 2007 at 02:03:44PM -0800, Junio C Hamano wrote:

> > Sure, but regular aliases already do that. The point of making it a
> > "builtin" alias is that we can depend on it being there. But who is
> > depending?
> 
> Nobody is depending.
> 
> And I think the reason nobody depends on it is because there is no
> compelling reason to.  Perhaps the behaviour is not useful enough.  It
> surely is the case for "bisect view".

Right, which leads to my (perhaps subtle) point that the builtin alias
hack is just what you said elsewhere: a cute hack. IOW, I am slightly
NAKing inclusion of it in master (OTOH, I really don't see what it could
_hurt_, so maybe somebody could find a use for it that we didn't think
of).

-Peff

^ permalink raw reply

* Re: git guidance
From: Luke Lu @ 2007-12-07 22:07 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Al Boldi, Jakub Narebski, Andreas Ericsson, Johannes Schindelin,
	Phillip Susi, Linus Torvalds, Jing Xue, linux-kernel, git
In-Reply-To: <11272.1197056185@turing-police.cc.vt.edu>

On Dec 7, 2007, at 11:36 AM, Valdis.Kletnieks@vt.edu wrote:
> On Fri, 07 Dec 2007 22:04:48 +0300, Al Boldi said:
>
>> Because WORKFLOW C is transparent, it won't affect other  
>> workflows.  So you
>> could still use your normal WORKFLOW B in addition to WORKFLOW C,  
>> gaining an
>> additional level of version control detail at no extra cost other  
>> than the
>> git-engine scratch repository overhead.
>>
>> BTW, is git efficient enough to handle WORKFLOW C?
>
> Imagine the number of commits a 'make clean; make' will do in a  
> kernel tree, as
> it commits all those .o files... :)

My guess is that Al is not really a developer (product management/ 
marketing?), what he has in mind is probably not an SCM but a backup  
system a la Mac's time machine or Netapp's snapshots that also  
support disconnected commits. I think that git could be a suitable  
engine for such systems, after a few tweaks to avoid compressing  
already compressed blobs like jpeg, mp3 and mpeg etc.

__Luke

^ permalink raw reply

* Re: [RFC/PATCH] Add a --nosort option to pack-objects
From: Mike Hommey @ 2007-12-07 21:44 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git, Junio C Hamano
In-Reply-To: <alpine.LFD.0.99999.0712071622150.555@xanadu.home>

On Fri, Dec 07, 2007 at 04:24:17PM -0500, Nicolas Pitre wrote:
> On Fri, 7 Dec 2007, Mike Hommey wrote:
> 
> > While most of the time the heuristics used by pack-objects to sort the
> > given object list are satisfying enough, there are cases where it can be
> > useful for the user to sort the list with heuristics that would be better
> > suited.
> 
> Could you please elaborate on those cases where the current heuristic 
> would be unsatisfactory?

I imagine it could be useful when importing a huge tree in the first commit,
when some data in the tree is redundant with (or similar to) others in the
same tree. I guess there could be some other VCS use-cases.

The real case where I've been using this is that I use git to store my debian
build logs in an efficient manner, and having a custom-sorted list of objects
ends up being much faster and less memory consuming than using a huge
window (and 1GB of logs became less than 10MB).

Mike

^ permalink raw reply

* Re: git-bisect feature suggestion: "git-bisect diff"
From: Junio C Hamano @ 2007-12-07 22:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Ingo Molnar, git
In-Reply-To: <20071207215514.GA11784@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Fri, Dec 07, 2007 at 01:44:12PM -0800, Junio C Hamano wrote:
>
>> Well, I think "git view" should not be just "predefined alias that the
>> user can override wholesale", which is what you currently have.  I think
>> it can just be an example in "git config" manpage (i.e. "If you want to,
>> you can alias 'view' to 'gitk' or 'gitview'") and I do not think we need
>> core-side support for that.
>
> Sure, but regular aliases already do that. The point of making it a
> "builtin" alias is that we can depend on it being there. But who is
> depending?

Nobody is depending.

And I think the reason nobody depends on it is because there is no
compelling reason to.  Perhaps the behaviour is not useful enough.  It
surely is the case for "bisect view".

^ permalink raw reply

* Re: git guidance
From: Björn Steinbrink @ 2007-12-07 22:00 UTC (permalink / raw)
  To: Al Boldi
  Cc: Andreas Ericsson, Johannes Schindelin, Phillip Susi,
	Linus Torvalds, Jing Xue, linux-kernel, git
In-Reply-To: <200712071353.11654.a1426z@gawab.com>

[-- Attachment #1: Type: text/plain, Size: 3930 bytes --]

On 2007.12.07 13:53:11 +0300, Al Boldi wrote:
> Andreas Ericsson wrote:
> > So, to get to the bottom of this, which of the following workflows is it
> > you want git to support?
> >
> > ### WORKFLOW A ###
> > edit, edit, edit
> > edit, edit, edit
> > edit, edit, edit
> > Oops I made a mistake and need to hop back to "current - 12".
> > edit, edit, edit
> > edit, edit, edit
> > publish everything, similar to just tarring up your workdir and sending
> > out ### END WORKFLOW A ###
> >
> > ### WORKFLOW B ###
> > edit, edit, edit
> > ok this looks good, I want to save a checkpoint here
> > edit, edit, edit
> > looks good again. next checkpoint
> > edit, edit, edit
> > oh crap, back to checkpoint 2
> > edit, edit, edit
> > ooh, that's better. save a checkpoint and publish those checkpoints
> > ### END WORKFLOW B ###
> 
> ### WORKFLOW C ###
> for every save on a gitfs mounted dir, do an implied checkpoint, commit, or 
> publish (should be adjustable), on its privately created on-the-fly 
> repository.
> ### END WORKFLOW C ###
> 
> For example:
> 
>   echo "// last comment on this file" >> /gitfs.mounted/file
> 
> should do an implied checkpoint, and make these checkpoints immediately 
> visible under some checkpoint branch of the gitfs mounted dir.
> 
> Note, this way the developer gets version control without even noticing, and 
> works completely transparent to any kind of application.

Ouch... That looks worse than "plain" per-file versioning. Not only do
you per definition get "broken" commits if there's a change that affects
two dependent files, you also get an insane amount of commits just for
testing stuff, or fixing bugs.

And unless you use some kind of union-fs on top (or keep ignored files
in special unversioned area in your gitfs, which seems somewhat ugly),
you'll probably also have to track lots of files in the working
directory that are generated, unless you want to re-generate them after
each reboot. And that leads to even more absolutely useless revisions.

Just thinking of my vim .swp files (which I definitely don't want to
loose on a crash/power outtage/pkill -9 .<ENTER> dammit) makes me scream
because of the gazillion of commits they will produce (and no, I don't
want them in some special out of tree directory).

Plus, I have vim setup to _replace_ files on write, so that I can more
easily use hard-linked copies with changing all copies at once _unless_
I explicitly want to, meaning that I'd get full remove/add commits,
which are absolutely useless. And trying to detect such patterns
(rename, then write the changed file with the old name and then delete
the renamed file) is probably not worth the trouble, because you
coincidently might _want_ to have just these three steps recorded when
you happen to perform them manually. And if you go for heuristics,
you'll complain each time you get a false-positive/negative.


That said, out of pure curiousness I came up with the attached script
which just uses inotifywait to watch a directory and issue git commands
on certain events. It is extremely stupid, but seems to work. And at
least it hasn't got the drawbacks of a real gitfs regarding the need to
have a "separate" non-versioned storage area for the working directory,
because it simply uses the existing working directory wherever that
might be stored. It doesn't use GIT_DIR/WORK_DIR yet, but hey, should be
easy to add...

Feel free to mess with that thing, hey, maybe you even like it and
extend it to match your proposed workflow even more. I for sure won't
use or even extend it, so you're likely on your own there.

Side-note: Writing that script probably took less time than writing this
email and probably less time than was wasted on this topic. Makes me
want to use today's preferred "Code talks, b...s... walks" statement,
but I'll refrain from that... Just because I lack the credibility to say
that, and the script attached is quite crappy ;-)

Björn

[-- Attachment #2: git-watch --]
[-- Type: text/plain, Size: 814 bytes --]

#!/bin/bash
inotifywait -m -r --exclude ^\./\.git/.* -e close_write -e move -e create -e delete . 2>/dev/null |
while read FILE_PATH EVENT FILE_NAME
do
	FILE_NAME="$FILE_PATH$FILE_NAME"
	FILE_NAME=${FILE_NAME#./}

	# git doesn't care about directories
	if [ -d "$FILE_NAME" ]
	then
		continue
	fi

	case "$EVENT" in
		*CLOSE_WRITE*)
		ACTION=change
		;;
		*MOVED_TO*)
		ACTION=create
		;;
		*MODIFY*)
		ACTION=change
		;;
		*DELETE*)
		ACTION=delete
		;;
		*MOVED_FROM*)
		ACTION=delete
		;;
		*CREATE*)
		ACTION=create
		;;
		*)
		continue
		;;
	esac

	case $ACTION in
		create)
		git add "$FILE_NAME"
		git commit -m "$FILE_NAME created"
		;;
		delete)
		git rm --cached "$FILE_NAME"
		git commit -m "$FILE_NAME removed"
		;;
		change)
		git add "$FILE_NAME"
		git commit -m "$FILE_NAME changed"
		;;
	esac
done

^ permalink raw reply

* Re: git-bisect feature suggestion: "git-bisect diff"
From: Jeff King @ 2007-12-07 21:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ingo Molnar, git
In-Reply-To: <7vtzmuyyc3.fsf@gitster.siamese.dyndns.org>

On Fri, Dec 07, 2007 at 01:44:12PM -0800, Junio C Hamano wrote:

> Well, I think "git view" should not be just "predefined alias that the
> user can override wholesale", which is what you currently have.  I think
> it can just be an example in "git config" manpage (i.e. "If you want to,
> you can alias 'view' to 'gitk' or 'gitview'") and I do not think we need
> core-side support for that.

Sure, but regular aliases already do that. The point of making it a
"builtin" alias is that we can depend on it being there. But who is
depending?

-Peff

^ permalink raw reply

* Re: RAM consumption when working with the gcc repo
From: Jon Smirl @ 2007-12-07 21:50 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: david, Git Mailing List
In-Reply-To: <alpine.LFD.0.99999.0712071529580.555@xanadu.home>

This is for a 3.3GB process with the 2GB pack as input
Looking at my process map, why is the pack file in the map four times?

2ba1f703b000-2ba23703b000 r--p 00000000 09:01 33079321
  /video/gcc/.git/objects/pack/pack-bd163555ea9240a7fdd07d2708a293872665f48b.pack
2ba23703b000-2ba23703c000 rw-p 2ba23703b000 00:00 0
2ba237c86000-2ba239352000 r--p 80000000 09:01 33079321
  /video/gcc/.git/objects/pack/pack-bd163555ea9240a7fdd07d2708a293872665f48b.pack
2ba2394b1000-2ba2794b1000 r--p 40000000 09:01 33079321
  /video/gcc/.git/objects/pack/pack-bd163555ea9240a7fdd07d2708a293872665f48b.pack
2ba2794b1000-2ba27a4b2000 rw-p 2ba2794b1000 00:00 0
2ba27bcb2000-2ba281c29000 rw-p 2ba23703c000 00:00 0
2ba281c29000-2ba2a32f5000 r--p 60000000 09:01 33079321
  /video/gcc/.git/objects/pack/pack-bd163555ea9240a7fdd07d2708a293872665f48b.pack
7fffb75e2000-7fffb75f7000 rw-p 7ffffffea000 00:00 0                      [stack]
7fffb75fe000-7fffb7600000 r-xp 7fffb75fe000 00:00 0                      [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
  [vsyscall]

Here's the heap:

00400000-004b9000 r-xp 00000000 08:16 296588
  /usr/local/bin/git
006b9000-006bd000 rw-p 000b9000 08:16 296588
  /usr/local/bin/git
006bd000-0c17f000 rw-p 006bd000 00:00 0                                  [heap]
40000000-40001000 ---p 40000000 00:00 0
40001000-40801000 rw-p 40001000 00:00 0
40801000-40802000 ---p 40801000 00:00 0
40802000-41002000 rw-p 40802000 00:00 0
41002000-41003000 ---p 41002000 00:00 0
41003000-41803000 rw-p 41003000 00:00 0
41803000-41804000 ---p 41803000 00:00 0
41804000-42004000 rw-p 41804000 00:00 0



-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: [RFC/PATCH] Add a --nosort option to pack-objects
From: Mike Hommey @ 2007-12-07 21:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v4peu19kr.fsf@gitster.siamese.dyndns.org>

On Fri, Dec 07, 2007 at 01:25:24PM -0800, Junio C Hamano wrote:
> I need to rant here a bit.
> 
> Sometimes people say "Here is my patch.  If this is accepted, I'll add
> documentation and tests".  My reaction is, "Don't you, as the person who
> proposes that change, believe in your patch deeply enough yourself to be
> willing to perfect it, to make it suitable for consumption by the
> general public, whether it is included in my tree or not?  A change that
> even you do not believe in deeply enough probably to perfect would not
> benefit the general public, so thanks but no thanks, I'll pass."

As you can seen from my other message, I'm *actually* not sure this is
really material for git as a VCS. I will add documentation unrelated to
--nosort to pack-objects anyways.

Mike

^ permalink raw reply

* Re: git-bisect feature suggestion: "git-bisect diff"
From: Junio C Hamano @ 2007-12-07 21:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Ingo Molnar, git
In-Reply-To: <20071207213541.GA11723@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Fri, Dec 07, 2007 at 04:34:14PM -0500, Jeff King wrote:
>
>> On Fri, Dec 07, 2007 at 02:25:34AM -0800, Junio C Hamano wrote:
>> 
>> > git-bisect visualize: work in non-windowed environments better
>> 
>> Isn't this more or less the use case for the "git view" alias?
>
> Which isn't to say that I don't think your solution is nicer; it is. But
> if we don't use it here, then perhaps "git view" really is a solution in
> search of a problem.

Well, I think "git view" should not be just "predefined alias that the
user can override wholesale", which is what you currently have.  I think
it can just be an example in "git config" manpage (i.e. "If you want to,
you can alias 'view' to 'gitk' or 'gitview'") and I do not think we need
core-side support for that.

If it becomes cleverer, that's a different story.  Noticing if the user
is in windowing environment or not, and acting differently, would make
it a single command that acts sensibly and in context sensitive way.

^ permalink raw reply

* Re: RAM consumption when working with the gcc repo
From: Jon Smirl @ 2007-12-07 21:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List
In-Reply-To: <20071207213928.GA11613@coredump.intra.peff.net>

On 12/7/07, Jeff King <peff@peff.net> wrote:
> On Fri, Dec 07, 2007 at 03:07:05PM -0500, Jon Smirl wrote:
>
> > I noticed two things when doing a repack of the gcc repo. First is
> > that the git process is getting to be way too big. Turning off the
> > delta caches had minimal impact. Why does the process still grow to
> > 4.8GB?
> >
> > Putting this in perspective, this is a 4.8GB process constructing a
> > 330MB file. Something isn't right. Memory leak or inefficient data
> > structure?
>
> Keep in mind that you are trying many different deltas, which are being
> held in memory, to find the right one and generate the 330MB file. And
> when you multiply that times N threads going at once, _each one_ is
> using a bunch of memory.
>
> As Nico suggested, you could probably drop the memory usage by reducing
> the size of the delta cache.

Delta cache is disabled --

pack.deltacachelimit = 0

Unless this option is broken?

>
> -Peff
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: RAM consumption when working with the gcc repo
From: Jeff King @ 2007-12-07 21:40 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List
In-Reply-To: <20071207213928.GA11613@coredump.intra.peff.net>

On Fri, Dec 07, 2007 at 04:39:28PM -0500, Jeff King wrote:

> Keep in mind that you are trying many different deltas, which are being
> held in memory, to find the right one and generate the 330MB file. And
> when you multiply that times N threads going at once, _each one_ is
> using a bunch of memory.
> 
> As Nico suggested, you could probably drop the memory usage by reducing
> the size of the delta cache.

Sorry, I clearly need to start reading to the ends of threads before
getting involved; I think Nico has already explained this with actual
numbers later on.

-Peff

^ permalink raw reply

* Re: RAM consumption when working with the gcc repo
From: Jon Smirl @ 2007-12-07 21:39 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: david, Git Mailing List
In-Reply-To: <alpine.LFD.0.99999.0712071529580.555@xanadu.home>

Here's a big clue.

When I repack the 300MB file the process grows to 4.8GB
When I repack the 2,000MB file the process grows to 3.3GB

In both cases the last 10% of the repack is taking as much time as the
first 90%.

At the end I am packing 60 objects/sec. In the beginning i was packing
1,000s of objects per second.

I'm not swapping

jonsmirl@terra:/video/gcc/.git/objects/pack/foo$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  0   1416  25668   3904 2756404    0    0    62    45  115  398  6  0 93  1
 3  0   1416  26880   3900 2754852    0    0     0     0  414 2453 26  1 73  0
 2  0   1416  26880   3900 2754852    0    0     0     0  472 3518 26  1 73  0
 4  0   1416  26912   3900 2754768    0    0     0     0  394 1642 26  1 74  0
 2  0   1416  26912   3900 2754768    0    0     0     0  401 1364 25  0 75  0
 2  0   1416  26896   3900 2754768    0    0     0     0  456 1922 25  1 75  0



-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: RAM consumption when working with the gcc repo
From: Jeff King @ 2007-12-07 21:39 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Git Mailing List
In-Reply-To: <9e4733910712071207p750c14f4h7abc5d637da3a478@mail.gmail.com>

On Fri, Dec 07, 2007 at 03:07:05PM -0500, Jon Smirl wrote:

> I noticed two things when doing a repack of the gcc repo. First is
> that the git process is getting to be way too big. Turning off the
> delta caches had minimal impact. Why does the process still grow to
> 4.8GB?
> 
> Putting this in perspective, this is a 4.8GB process constructing a
> 330MB file. Something isn't right. Memory leak or inefficient data
> structure?

Keep in mind that you are trying many different deltas, which are being
held in memory, to find the right one and generate the 330MB file. And
when you multiply that times N threads going at once, _each one_ is
using a bunch of memory.

As Nico suggested, you could probably drop the memory usage by reducing
the size of the delta cache.

-Peff

^ permalink raw reply

* Re: [RFC/PATCH] Add a --nosort option to pack-objects
From: Junio C Hamano @ 2007-12-07 21:37 UTC (permalink / raw)
  To: Mike Hommey; +Cc: git
In-Reply-To: <7v4peu19kr.fsf@gitster.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> Mike Hommey <mh@glandium.org> writes:
>
>> The --nosort option disabled the internal sorting used by pack-objects,
>> and runs the sliding window along the object list litterally as given on
>> stdin.
>
> I think this is a good way to give people an easier way to experiment.

Actually, I take the half of this back.

We need a pair of two good sort orders.  The order of objects fed to
pack-objects determines the final layout of the resulting pack, and
using something like the "recency order" we currently have is to
optimize the layout in the resulting pack for typical access patterns.

By using your --nosort, you may be able to influence the deltification
process, but the order you use will most likely not match the access
pattern of the resulting pack.  So it will be an easy, quick-and-dirty
way to _experiment_ how the deltification sort order affects the final
pack size, but I suspect that the resulting "small" pack won't be useful
in the real life.

^ permalink raw reply

* Re: What's cooking in git.git (topics)
From: Miklos Vajna @ 2007-12-07 21:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vejdy4yuw.fsf@gitster.siamese.dyndns.org>

[-- Attachment #1: Type: text/plain, Size: 336 bytes --]

On Fri, Dec 07, 2007 at 01:51:03AM -0800, Junio C Hamano <gitster@pobox.com> wrote:
> * jc/git-log-doc (Thu Nov 1 15:57:40 2007 +0100) 1 commit
> 
> Rewrote Miklos's patch rather extensively.  Need to be in v1.5.4.

sorry, i totally forgot about this patch while you asked me to test it.
it looks ok, to me

thanks,
- VMiklos

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: After-the-fact submodule detection or creation
From: Michael Poole @ 2007-12-07 21:35 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git
In-Reply-To: <20071207073728.GA2847@steel.home>

Alex Riesen writes:

> Michael Poole, Fri, Dec 07, 2007 04:01:04 +0100:
>> It seems like using the current submodule code would mean that this
>> kind of import would need two passes over the foreign repository,
>> rather than one if the branch could be created after the parent tree
>> is initially imported.  I can live with that -- it is a rather unusual
>> case -- but maybe there is a better way.)
>
> Import the core module in a branch all by itself, and merge it in
> every support branch?
>
>
>     Supp1: o-o-o-----o-o-o-o-o-o-o
> 		    /
>     Core:  o-o-o-o-o
> 		    \
>     Supp2: o-o-------o-o-o-o

Yes, that's the obvious way to do it with submodules.  Teaching
git-svn to use that is the hard part.

Since the core code was first branched independently at r734 in the
existing repository, the import (either automated or manual) would
need to go through once to identify what subdirectories are actually
submodules in git terminology, and make a second pass to actually
perform the imports.  If the submodule creation could happen after the
fact, it would only need one pass.

Maybe the right question to ask is whether having a partial-tree
branch can be reasonably handled by git (in particular, detecting a
rename of the core subtree to the top-level tree in the new branch's
first commit).  If git understand that operation, then what I would
like to do would be reasonably straightforward.  If it does not make
sense, then I'll think about how to teach git-svn that certain
subdirectories should be promoted to submodules.

Michael Poole

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox