git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kde.git is now online
@ 2007-04-05 17:03 H. Peter Anvin
  2007-04-05 17:30 ` Linus Torvalds
  2007-04-05 21:26 ` Junio C Hamano
  0 siblings, 2 replies; 11+ messages in thread
From: H. Peter Anvin @ 2007-04-05 17:03 UTC (permalink / raw)
  To: Git Mailing List, Chris Lee

I received the DVD from Chris Lee with a test conversion of KDE's 
Subversion repository to git.

I have uploaded it to:

http://userweb.kernel.org/~hpa/kdegit/

It's available both as a tarball and as an expanded tree.

	-hpa

P.S. I still want Kcharselect to display the Unicode names of the 
characters.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kde.git is now online
  2007-04-05 17:03 kde.git is now online H. Peter Anvin
@ 2007-04-05 17:30 ` Linus Torvalds
  2007-04-05 17:38   ` Nicolas Pitre
  2007-04-05 18:03   ` Chris Lee
  2007-04-05 21:26 ` Junio C Hamano
  1 sibling, 2 replies; 11+ messages in thread
From: Linus Torvalds @ 2007-04-05 17:30 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Git Mailing List, Chris Lee



On Thu, 5 Apr 2007, H. Peter Anvin wrote:
> 
> http://userweb.kernel.org/~hpa/kdegit/
> 
> It's available both as a tarball and as an expanded tree.

Thanks. Am downloading it right now ("0% 146.99kB/s" - it will take quite 
some time ;)

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kde.git is now online
  2007-04-05 17:30 ` Linus Torvalds
@ 2007-04-05 17:38   ` Nicolas Pitre
  2007-04-05 19:45     ` Nicolas Pitre
  2007-04-05 18:03   ` Chris Lee
  1 sibling, 1 reply; 11+ messages in thread
From: Nicolas Pitre @ 2007-04-05 17:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, Git Mailing List, Chris Lee

On Thu, 5 Apr 2007, Linus Torvalds wrote:

> 
> 
> On Thu, 5 Apr 2007, H. Peter Anvin wrote:
> > 
> > http://userweb.kernel.org/~hpa/kdegit/
> > 
> > It's available both as a tarball and as an expanded tree.
> 
> Thanks. Am downloading it right now ("0% 146.99kB/s" - it will take quite 
> some time ;)

I'm downloading it too (794.3 KB/s).


Nicolas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kde.git is now online
  2007-04-05 17:30 ` Linus Torvalds
  2007-04-05 17:38   ` Nicolas Pitre
@ 2007-04-05 18:03   ` Chris Lee
  1 sibling, 0 replies; 11+ messages in thread
From: Chris Lee @ 2007-04-05 18:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, Git Mailing List

On 4/5/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> On Thu, 5 Apr 2007, H. Peter Anvin wrote:
> >
> > http://userweb.kernel.org/~hpa/kdegit/
> >
> > It's available both as a tarball and as an expanded tree.
>
> Thanks. Am downloading it right now ("0% 146.99kB/s" - it will take quite
> some time ;)

Imagine how much longer it'd take if it were being served up from my
home connection. :)

Many thanks to hpa for putting this up!

-clee

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kde.git is now online
  2007-04-05 17:38   ` Nicolas Pitre
@ 2007-04-05 19:45     ` Nicolas Pitre
  2007-04-05 20:51       ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Nicolas Pitre @ 2007-04-05 19:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, Git Mailing List, Chris Lee

On Thu, 5 Apr 2007, Nicolas Pitre wrote:

> On Thu, 5 Apr 2007, Linus Torvalds wrote:
> 
> > 
> > 
> > On Thu, 5 Apr 2007, H. Peter Anvin wrote:
> > > 
> > > http://userweb.kernel.org/~hpa/kdegit/
> > > 
> > > It's available both as a tarball and as an expanded tree.
> > 
> > Thanks. Am downloading it right now ("0% 146.99kB/s" - it will take quite 
> > some time ;)
> 
> I'm downloading it too (794.3 KB/s).

OK this is a really nice test repo. I have only 1 GB of ram, and 
although basic operations appear to work just fine, this data set shows 
its weight in some ways.

For example I think there might be ways to improve the pack mmap 
windowing, or git-fsck's IO patterns.  For example, git-fsck --full 
spend 96% of the time waiting for IO completion and only 4% actually 
performing some work according to top.  At that rate that makes fsck 
--full rather unusable on this repo.  Without --full then fsck completes 
in less than 2 seconds.


Nicolas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kde.git is now online
  2007-04-05 19:45     ` Nicolas Pitre
@ 2007-04-05 20:51       ` Linus Torvalds
  2007-04-05 22:00         ` Nicolas Pitre
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2007-04-05 20:51 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: H. Peter Anvin, Git Mailing List, Chris Lee



On Thu, 5 Apr 2007, Nicolas Pitre wrote:
> 
> For example I think there might be ways to improve the pack mmap 
> windowing, or git-fsck's IO patterns.  For example, git-fsck --full 
> spend 96% of the time waiting for IO completion and only 4% actually 
> performing some work according to top.  At that rate that makes fsck 
> --full rather unusable on this repo.  Without --full then fsck completes 
> in less than 2 seconds.

Without "--full", it doesn't actually really do anything much, since it 
will basically ignore objects that are in the pack.

With --full, there are certainly things that we could improve upon. We 
currently tend to walk things a few times for pack contents: 
 - first we do the SHA1 of the full pack
 - then we go back, and unpack and fsck each entry in the pack.

So if the pack-file is too big to fit in memory, we'll basically always 
read it at least twice (and that's ignoring the fact that delta lookup 
will obviously seek back and forth, which makes access patterns worse).

On the other hand, there's a perfectly good reason why we don't actually 
fsck pack-files by default. They're "stable storage". You don't normally 
need to. So I'd not worry too much about fsck performance. I suspect 
you'll find that with 1GB or RAM you'll have other performance problems 
that are more pressing ("git clone" comes to mind ;)

Me, I'm just 53% done with the download, so I probably won't be looking at 
this today ;)

			Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kde.git is now online
  2007-04-05 17:03 kde.git is now online H. Peter Anvin
  2007-04-05 17:30 ` Linus Torvalds
@ 2007-04-05 21:26 ` Junio C Hamano
  2007-04-06 11:32   ` Geert Bosch
  1 sibling, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2007-04-05 21:26 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Git Mailing List, Chris Lee

Thanks.  Slurping now.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kde.git is now online
  2007-04-05 20:51       ` Linus Torvalds
@ 2007-04-05 22:00         ` Nicolas Pitre
  2007-04-06  1:24           ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Nicolas Pitre @ 2007-04-05 22:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, Git Mailing List, Chris Lee

On Thu, 5 Apr 2007, Linus Torvalds wrote:

> Without "--full", it doesn't actually really do anything much, since it 
> will basically ignore objects that are in the pack.
> 
> With --full, there are certainly things that we could improve upon. We 
> currently tend to walk things a few times for pack contents: 
>  - first we do the SHA1 of the full pack
>  - then we go back, and unpack and fsck each entry in the pack.
> 
> So if the pack-file is too big to fit in memory, we'll basically always 
> read it at least twice (and that's ignoring the fact that delta lookup 
> will obviously seek back and forth, which makes access patterns worse).
> 
> On the other hand, there's a perfectly good reason why we don't actually 
> fsck pack-files by default. They're "stable storage". You don't normally 
> need to. So I'd not worry too much about fsck performance.

Well.... still it certainly can be helped a bit.  I wouldn't mind it 
spending half an hour of CPU if it needs to.  But I just interrupted it
with ^C with the following result so far:

real    75m44.374s
user    2m5.318s
sys     0m54.059s

(I should have used /usr/bin/time to see the number of page faults).

> I suspect you'll find that with 1GB or RAM you'll have other 
> performance problems that are more pressing ("git clone" comes to mind 
> ;)

Well... same issue actually.  git-pack-objects spent about 40 secs 
firmly at 100% CPU usage counting objects.

Then it got stuck on:

	remote: Done counting 4111366 objects.

again spending 3% CPU and the rest waiting for IO with the disk 
definitely trashing.  It didn't allocate more than 47% of memory during 
that phase which lasted a few minutes.

Then, the "Indexing 4111366 objects." message appeared and CPU usage 
went up to 6% CPU with 67% memory for pack-objects and 30% CPU and 7% 
memory for index-pack while the rest was spent waiting for IO.  This 
also took maybe two minutes.

And now it reached the "Resolving 3305158 deltas." phase with only 
index-pack on the radar with approx 10% CPU and 19% memory, and the rest 
of the time waiting for IO again.

It has been probably half an our now and the thing is at:

	  21% (710502/3305158) done

So it will work and eventually complete.  And the good news is that the 
worst part performance wise is on the client side.  But it looks like 
we're definitely trashing the kernel buffer cache.


Nicolas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kde.git is now online
  2007-04-05 22:00         ` Nicolas Pitre
@ 2007-04-06  1:24           ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2007-04-06  1:24 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: H. Peter Anvin, Git Mailing List, Chris Lee



On Thu, 5 Apr 2007, Nicolas Pitre wrote:
> 
> Well.... still it certainly can be helped a bit.  I wouldn't mind it 
> spending half an hour of CPU if it needs to.  But I just interrupted it
> with ^C with the following result so far:
> 
> real    75m44.374s
> user    2m5.318s
> sys     0m54.059s

Well, the thing is, this is "normal", and doesn't really have a lot to do 
with git.

If the actual working set is larger than available memory, ~5% CPU time is 
actually pretty good. 

The only way to improve on it is to try to make the working set smaller. 
Sadly, that's often a really difficult thing to do ;(

> > I suspect you'll find that with 1GB or RAM you'll have other 
> > performance problems that are more pressing ("git clone" comes to mind 
> > ;)
> 
> Well... same issue actually.  git-pack-objects spent about 40 secs 
> firmly at 100% CPU usage counting objects.
> 
> Then it got stuck on:
> 
> 	remote: Done counting 4111366 objects.
> 
> again spending 3% CPU and the rest waiting for IO with the disk 
> definitely trashing.

Well, I seriously doubt it's the "same issue" except in the sense that 
yes, if you work with all objects, you are going to have a big working 
set.

Note that "working set" is different from "memory footprint". If you have 
good locality, the working set can be a *lot* smaller than the memory 
footprint, and that tends to be the best/only way to improve the working 
set: trying to not jump back-and-forth between different things.

One example of that kind of shrinkage of the working set was Junios commit 
57584d9eddc3482c5db0308203b9df50dc62109c to "git blame": by comparing the 
*pointers* rather than what they pointed to, you avoid having to follow 
the pointer all the way down.

However, doing that in general tends to be very difficult. We use hashes 
extensively (not just the obvious SHA1 hashes, but the object lookup 
itself is based on hash tables etc), and while they are nice and fast O(1) 
when you have enough memory, they do tend to spread things out so that you 
are using your memory potentially very sparsely, which is the last thing 
you want to do if you are paging.

Side note: I finally got the thing downloaded, and so I did a

	git checkout -f

and the trace is pretty horrid. It looks something like this:

	...
	lstat("kdeaccessibility/IconThemes/mono/scalable/apps/kimagemapeditor.svgz", 0x7fff6f8d29f0) = -1 ENOENT (No such file or directory)
	mkdir("kdeaccessibility", 0777)         = -1 EEXIST (File exists)
	unlink("kdeaccessibility")              = -1 EISDIR (Is a directory)
	stat("kdeaccessibility", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	mkdir("kdeaccessibility/IconThemes", 0777) = -1 EEXIST (File exists)
	unlink("kdeaccessibility/IconThemes")   = -1 EISDIR (Is a directory)
	stat("kdeaccessibility/IconThemes", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	mkdir("kdeaccessibility/IconThemes/mono", 0777) = -1 EEXIST (File exists)
	unlink("kdeaccessibility/IconThemes/mono") = -1 EISDIR (Is a directory)
	stat("kdeaccessibility/IconThemes/mono", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	mkdir("kdeaccessibility/IconThemes/mono/scalable", 0777) = -1 EEXIST (File exists)
	unlink("kdeaccessibility/IconThemes/mono/scalable") = -1 EISDIR (Is a directory)
	stat("kdeaccessibility/IconThemes/mono/scalable", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
	mkdir("kdeaccessibility/IconThemes/mono/scalable/apps", 0777) = -1 EEXIST (File exists)
	unlink("kdeaccessibility/IconThemes/mono/scalable/apps") = -1 EISDIR (Is a directory)
	stat("kdeaccessibility/IconThemes/mono/scalable/apps", {st_mode=S_IFDIR|0775, st_size=12288, ...}) = 0
	open("kdeaccessibility/IconThemes/mono/scalable/apps/kimagemapeditor.svgz", O_WRONLY|O_CREAT|O_EXCL, 0666) = 5
	write(5, "\37\213\10\10\205\3\263A\0\3kimagemapeditor.svg\0\344Z"..., 10112) = 10112
	close(5)                                = 0
	lstat("kdeaccessibility/IconThemes/mono/scalable/apps/kimagemapeditor.svgz", {st_mode=S_IFREG|0664, st_size=10112, ...}) = 0
	...

and that repeats for every single file. There's 233,902 of them. Oops.

On the other hand, we do certain things pretty well.  A "git diff", with
enough memory, takes 0.65s.  That's just over *half*a*second* for 233
*thousand* files.  I'd want to have tons of memory to work with this
repository, but if I did, I'd still think git is the best thing since
sliced bread. 

And doing ops like "git blame" on some random file I looked at was
actually instantaneous.  I probably happened to pick a new file just by
luck, but still..  Most things definitely work pretty damn well. 

(Update: I did a

	git log --raw -r |
		grep '^:100644.*M' |
		cut -f2 |
		sort |
		uniq -c |
		sort -n

to see the file that was updated the most, to get some kind of
worst-case for "git blame".  The list looks like:

   ...
   1091 koffice/kword/kwview.cc
   1099 kdelibs/khtml/khtml_part.cpp
   1116 koffice/kpresenter/kpresenter_view.cc
   1171 kdevelop/ChangeLog
   1667 kde-common/accounts

and while "git blame" is slow on them, it's not *painfully* so.  It took
13s to get the kdevelop/ChangeLog blame, and 31s (probably because the
diffs are much more interesting) to get the kpresenter_view.cc blame. 
Too slow, but still usable, and "git gui" again made it more interesting
to wait for it.. 

That said, the more I look at this, the more I think that this is *the*
perfect example of why you shouldn't put everything in one big
repository.  Git should be able to handle it, but nobody should really
do things like that. It's just stupid.

I will think hard about submodules.

			Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kde.git is now online
  2007-04-05 21:26 ` Junio C Hamano
@ 2007-04-06 11:32   ` Geert Bosch
  2007-04-06 12:59     ` Nicolas Pitre
  0 siblings, 1 reply; 11+ messages in thread
From: Geert Bosch @ 2007-04-06 11:32 UTC (permalink / raw)
  To: Git Mailing List

On my Mac OS X system, cloning this fails with:

potomac:~/kde%git clone http://userweb.kernel.org/~hpa/kdegit/kde.git
Initialized empty Git repository in /Users/bosch/kde/kde/.git/
Getting alternates list for http://userweb.kernel.org/~hpa/kdegit/ 
kde.git/
Getting pack list for http://userweb.kernel.org/~hpa/kdegit/kde.git/
Getting index for pack c3df59bc67f69b3861ebef8de308156f1c5fe017
Getting pack c3df59bc67f69b3861ebef8de308156f1c5fe017
which contains ca908d2d51f154aab9f5727c1e57fb23a2942485
fatal: packfile /Users/bosch/kde/kde/.git/objects/pack/pack- 
c3df59bc67f69b3861ebef8de308156f1c5fe017.pack cannot be mapped.

Even worse, al files seem to have been deleted, so I have to
download this again. I guess, I shouldn't have used git clone...

   -Geert

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kde.git is now online
  2007-04-06 11:32   ` Geert Bosch
@ 2007-04-06 12:59     ` Nicolas Pitre
  0 siblings, 0 replies; 11+ messages in thread
From: Nicolas Pitre @ 2007-04-06 12:59 UTC (permalink / raw)
  To: Geert Bosch; +Cc: Git Mailing List

On Fri, 6 Apr 2007, Geert Bosch wrote:

> On my Mac OS X system, cloning this fails with:
> 
> potomac:~/kde%git clone http://userweb.kernel.org/~hpa/kdegit/kde.git
> Initialized empty Git repository in /Users/bosch/kde/kde/.git/
> Getting alternates list for http://userweb.kernel.org/~hpa/kdegit/kde.git/
> Getting pack list for http://userweb.kernel.org/~hpa/kdegit/kde.git/
> Getting index for pack c3df59bc67f69b3861ebef8de308156f1c5fe017
> Getting pack c3df59bc67f69b3861ebef8de308156f1c5fe017
> which contains ca908d2d51f154aab9f5727c1e57fb23a2942485
> fatal: packfile
> /Users/bosch/kde/kde/.git/objects/pack/pack-c3df59bc67f69b3861ebef8de308156f1c5fe017.pack
> cannot be mapped.
> 
> Even worse, al files seem to have been deleted, so I have to
> download this again. I guess, I shouldn't have used git clone...

Given that this repo is pushing it to the limits, you better download 
the .tar.bz2 archive and play with it locally first.


Nicolas

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-04-06 12:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-05 17:03 kde.git is now online H. Peter Anvin
2007-04-05 17:30 ` Linus Torvalds
2007-04-05 17:38   ` Nicolas Pitre
2007-04-05 19:45     ` Nicolas Pitre
2007-04-05 20:51       ` Linus Torvalds
2007-04-05 22:00         ` Nicolas Pitre
2007-04-06  1:24           ` Linus Torvalds
2007-04-05 18:03   ` Chris Lee
2007-04-05 21:26 ` Junio C Hamano
2007-04-06 11:32   ` Geert Bosch
2007-04-06 12:59     ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).