From: Nicolas Sebrecht <nicolas.s-dev@laposte.net>
To: "Robin H. Johnson" <robbat2@gentoo.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Performance issue: initial git clone causes massive repack
Date: Sun, 5 Apr 2009 21:02:13 +0200 [thread overview]
Message-ID: <20090405190213.GA12929@vidovic> (raw)
In-Reply-To: <20090405070412.GB869@curie-int>
On Sun, Apr 05, 2009 at 12:04:12AM -0700, Robin H. Johnson wrote:
> Before I answer the rest of your post, I'd like to note that the matter
> of which choice between single-repo, repo-per-package, repo-per-category
> has been flogged to death within Gentoo.
>
> I did not come to the Git mailing list to rehash those choices. I came
> here to find a solution to the performance problem.
I understand. I know two ways to resolve this:
- by resolving the performance problem itself,
- by changing the workflow to something more accurate and more suitable
against the facts.
My point is that going from a centralized to a decentralized SCM
involves breacking strongly how developers and maintainers work. What
you're currently suggesting is a way to work with Git in a centralized
way. This sucks. To get the things right with Git I would avoid shared
and global repositories. Gnome is doing it this way:
http://gitorious.org/projects/gnome-svn-hooks/repos/mainline/trees/master
> The GSoC 2009 ideas contain a potential project for caching the
> generated packs, which, while having value in itself, could be partially
> avoided by sending suitable pre-built packs (if they exist) without any
> repacking.
Right. It could be an option to wait and see if the GSoC gives
something.
> Also, I should note that working on the tree isn't the only reason to
> have the tree checked out. While the great majority of Gentoo users have
> their trees purely from rsync, there is nothing stopping you from using
> a tree from CVS (anonCVS for the users, master CVS server for the
> developers).
>
> A quick bit of stats run show that while some developers only touch a
> few packages, there are at least 200 developers that have done a major
> change to 100 or more packages.
That's a point that has to be reconsidered. Not the fact that at least
200 developers work on over 100 packages (this is really not an issue)¹
but the fact that they do that directly on the main repo/server. The
good way to achieve this is to send his work to the maintainer². The main
issue is a better code reviewing.
1. Some or all repo-per-category can be tracked with a simple script.
2. Maintainers could be - or not be - the same developers as today.
Adding a layer of maintainers in charge of EAPI review (for example) up
to the packages-maintainers could help in fixing a lot of portage issues
and would avoid "simple developers" to do crap on the main repo(s) that
users download.
> And per-package numbers, because we DID do an experimental conversion,
> last year, although the packs might not have been optimal:
> - ~410MiB of content (w/ 4kb inodes)
> - 4.7GiB of Git total overhead, with a breakdown:
> - 1.9GiB in inode waste
> - 2.8GiB in packs
Ok.
> > One repo per category could be a good compromise assuming one seperate
> > branch per package, then.
> Other downsides to repo-per-category and repo-per-package:
Let's forget a repo-per-package.
> - Raises difficulty in adding a new package/category.
> You cannot just do 'mkdir && vi ... && git add && git commit' anymore.
Right, but categories are not evolving that much.
> - The name of the directory for both of the category AND the package are not
> specified in the ebuild, as such, unless they are checked out to the right
> location, you will get breakage (definitely in the package name, and
> about 10% of the time with categories).
Of course. Quite franckly, it's recoverable without pain.
A repo-per-category local workflow would be:
$ git branch
master
* next
package_one
package_two
[...]
$ tree -a
|-- .git
| |-- [...]
| [...]
|-- package_one
| |-- ChangeLog
| |-- Manifest
| |-- metadata.xml
| |-- package_one-0.4.ebuild
| `-- package_one-0.5.ebuild
|-- package_two
| |-- ChangeLog
| |-- Manifest
| |-- files
| | |-- package_two.confd
| | `-- package_two.rc
| |-- metadata.xml
| `-- package_two-0.7-r3.ebuild
[...]
$ git checkout package_one
$ tree -a
|-- .git
| |-- [...]
| [...]
`-- package_one
|-- ChangeLog
|-- Manifest
|-- metadata.xml
|-- package_one-0.4.ebuild
`-- package_one-0.5.ebuild
$ <hack, hack, hack>
$ git checkout next
$ git merge package_one
> - Does NOT present a good base for anybody wanting to branch the entire
> tree themselves.
Scriptable.
> We're already on track to drop the CVS $Header$, and thereafter, some of the
> ebuilds are already on track to be smaller. Here's our prototype dev-perl/Sub-Name-0.04.
> ====
> # Copyright 1999-2009 Gentoo Foundation
> # Distributed under the terms of the GNU General Public License v2
> MODULE_AUTHOR=XMATH
> inherit perl-module
> DESCRIPTION="(re)name a sub"
> LICENSE="|| ( Artistic GPL-2 )"
> SLOT="0"
> KEYWORDS="~amd64 ~x86"
> IUSE=""
> SRC_TEST=do
> ====
>
> We can have all the CPAN packages from CPAN author XMATH, with changing
> only the DESCRIPTION string. KEYWORDS then just changes over the package
> lifespan.
Sounds good.
--
Nicolas Sebrecht
next prev parent reply other threads:[~2009-04-05 19:04 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-04 22:07 Performance issue: initial git clone causes massive repack Robin H. Johnson
2009-04-05 0:05 ` Nicolas Sebrecht
2009-04-05 0:37 ` Robin H. Johnson
2009-04-05 3:54 ` Nicolas Sebrecht
2009-04-05 4:08 ` Nicolas Sebrecht
2009-04-05 7:04 ` Robin H. Johnson
2009-04-05 19:02 ` Nicolas Sebrecht [this message]
2009-04-05 19:17 ` Shawn O. Pearce
2009-04-05 23:02 ` Robin H. Johnson
2009-04-05 20:43 ` Robin H. Johnson
2009-04-05 21:08 ` Shawn O. Pearce
2009-04-05 21:28 ` david
2009-04-05 21:36 ` Sverre Rabbelier
2009-04-06 3:24 ` Nicolas Pitre
2009-04-07 8:10 ` Björn Steinbrink
2009-04-07 9:45 ` Jakub Narebski
2009-04-07 13:13 ` Nicolas Pitre
2009-04-07 13:37 ` Jakub Narebski
2009-04-07 14:03 ` Jon Smirl
2009-04-07 17:59 ` Nicolas Pitre
2009-04-07 14:21 ` Björn Steinbrink
2009-04-07 17:48 ` Nicolas Pitre
2009-04-07 18:12 ` Björn Steinbrink
2009-04-07 18:56 ` Nicolas Pitre
2009-04-07 20:27 ` Björn Steinbrink
2009-04-08 4:52 ` Nicolas Pitre
2009-04-10 20:38 ` Robin H. Johnson
2009-04-11 1:58 ` Nicolas Pitre
2009-04-11 7:06 ` Mike Hommey
2009-04-14 15:52 ` Johannes Schindelin
2009-04-14 20:17 ` Nicolas Pitre
2009-04-14 20:27 ` Robin H. Johnson
2009-04-14 21:02 ` Nicolas Pitre
2009-04-15 3:09 ` Nguyen Thai Ngoc Duy
2009-04-15 5:53 ` Robin H. Johnson
2009-04-15 5:54 ` Junio C Hamano
2009-04-15 11:51 ` Nicolas Pitre
2009-04-22 1:15 ` Sam Vilain
2009-04-22 9:55 ` Mike Ralphson
2009-04-22 11:24 ` Pieter de Bie
2009-04-22 13:19 ` Johannes Schindelin
2009-04-22 14:35 ` Shawn O. Pearce
2009-04-22 16:40 ` Andreas Ericsson
2009-04-22 17:06 ` Johannes Schindelin
2009-04-23 19:30 ` Christian Couder
2009-04-22 14:14 ` Nicolas Pitre
2009-04-22 22:01 ` Sam Vilain
2009-04-22 22:50 ` Björn Steinbrink
2009-04-22 23:07 ` Nicolas Pitre
2009-04-22 23:30 ` Johannes Schindelin
2009-04-23 3:16 ` Nicolas Pitre
2009-04-14 20:30 ` Johannes Schindelin
2009-04-07 20:29 ` Jeff King
2009-04-07 20:35 ` Björn Steinbrink
2009-04-08 11:28 ` [PATCH] process_{tree,blob}: Remove useless xstrdup calls Björn Steinbrink
2009-04-10 22:20 ` Linus Torvalds
2009-04-11 0:27 ` Linus Torvalds
2009-04-11 1:15 ` Linus Torvalds
2009-04-11 1:34 ` Nicolas Pitre
2009-04-11 13:41 ` Björn Steinbrink
2009-04-11 14:07 ` Björn Steinbrink
2009-04-11 18:06 ` Linus Torvalds
2009-04-11 18:22 ` Linus Torvalds
2009-04-11 19:22 ` Björn Steinbrink
2009-04-11 20:50 ` Björn Steinbrink
2009-04-11 21:43 ` Linus Torvalds
2009-04-11 23:24 ` Björn Steinbrink
2009-04-11 18:19 ` Linus Torvalds
2009-04-11 19:40 ` Björn Steinbrink
2009-04-11 19:58 ` Linus Torvalds
2009-04-05 22:59 ` Performance issue: initial git clone causes massive repack Nicolas Sebrecht
2009-04-05 23:20 ` david
2009-04-05 23:28 ` Robin Rosenberg
2009-04-06 3:34 ` Nicolas Pitre
2009-04-06 5:15 ` Junio C Hamano
2009-04-06 13:12 ` Nicolas Pitre
2009-04-06 13:52 ` Jon Smirl
2009-04-06 14:19 ` Nicolas Pitre
2009-04-06 14:37 ` Jon Smirl
2009-04-06 14:48 ` Shawn O. Pearce
2009-04-06 15:14 ` Nicolas Pitre
2009-04-06 15:28 ` Jon Smirl
2009-04-06 16:14 ` Nicolas Pitre
2009-04-06 11:22 ` Matthieu Moy
2009-04-06 13:29 ` Nicolas Pitre
2009-04-06 14:03 ` Robin H. Johnson
2009-04-06 14:14 ` Nicolas Pitre
2009-04-07 10:11 ` Martin Langhoff
2009-04-05 19:57 ` Jeff King
2009-04-05 23:38 ` Robin H. Johnson
2009-04-05 23:42 ` Robin H. Johnson
[not found] ` <0015174c150e49b5740466d7d2c2@google.com>
2009-04-06 0:29 ` Robin H. Johnson
2009-04-06 3:10 ` Nguyen Thai Ngoc Duy
2009-04-06 4:09 ` Nicolas Pitre
2009-04-06 4:06 ` Nicolas Pitre
2009-04-06 14:20 ` Robin H. Johnson
2009-04-11 17:24 ` Mark Levedahl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090405190213.GA12929@vidovic \
--to=nicolas.s-dev@laposte.net \
--cc=git@vger.kernel.org \
--cc=robbat2@gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).