From: "Greg A. Woods" <woods@planix.com>
To: The Git Mailing List <git@vger.kernel.org>
Subject: Re: multiple working directories for long-running builds (was: "git merge" merges too much!)
Date: Thu, 03 Dec 2009 00:11:09 -0500 [thread overview]
Message-ID: <m1NG3yC-000kmgC@most.weird.com> (raw)
In-Reply-To: <20091202001020.GF11235@dpotapov.dyndns.org>
[-- Attachment #1: Type: text/plain, Size: 4674 bytes --]
At Wed, 2 Dec 2009 03:10:21 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote:
Subject: Re: multiple working directories for long-running builds (was: "git merge" merges too much!)
>
> My point was that I do not see why you believe "git archive" is more
> expensive than "git clone". Accordingly to Jeff Epler's numbers,
> "git archive" is 20% faster than "git clone"...
Really!?!?!? You don't see it? Why is this so hard to understand?
Sorry for my incredulity, but I thought this issue was obvious.
The slightly more expensive "git clone" happens only _ONCE_. After that
you just run "git pull" I think (plus maybe "git reset --hard"?), but in
any case it's a heck of a lot less I/O and CPU than "git archive".
And of course you skip even the one-time "git clone" operation if you
use the even faster and simpler git-new-workdir script.
"git archive" has to be run _EVERY_ time you need to update a working
directory and it currently has no choice but to toss every bit of the
whole working directory, up from the filesystem, across a pipe, and back
down to the filesystem. It literally couldn't be more expensive!
Sure, no matter how you do it, updating the working directory might not
always be the biggest part of the operation, but it's insane to use the
most expensive mechanism ever possible when there are far cheaper
alternatives.
BTW, there cannot, and MUST NOT, be any integrity advantage to using
"git archive" over using multiple working directories. "git archive
branch" must, by definition, produce exactly the same result as if you
did "git checkout branch; rm -rf .git" or else it is buggy.
Note also that the build directories created with git-new-workdir can be
treated as read-only, and perhaps even forced to be read-only by mount
options or maybe just by a corporate policy directive. (in all projects
I'm working on the source tree can be read-only -- product files are
always generated elsewhere)
> Multiple copies of the same repo is never a problem (except taking some
> disks space).
Exactly -- gigabytes of disk space per copy in the cases I'm concerned
about (i.e. where hard links are impossible). I've heard that at least
one very large project has an 8GB repository currently. Three of the
large projects I work on now are about a gigabyte per copy. That's just
what's under .git too, not including the whole working directory as
well. I can't even manage a "git clone" from HTTP of one of them
without increasing my default process limits as it is so big and uses up
too much memory.
I guess one could skip the initial more-expensive "git clone" operation
by copying the repo using low-level bit moving commands, like "cp -r" or
whatever, and then tweak the result to make it appear as if it had been
cloned, but even that requires moving gigabytes of data unnecessarily
across what is likely to be a network connection of some sort.
Are you fighting against git-new-workdir, or the concept of multiple
working directories?
> > A major further advantage of multiple working directories is that this
> > eliminates one more point of failure -- i.e. you don't end up with
> > multiple copies of the repo that _should_ be effectively read-only for
> > everything but "push", and perhaps then only to one branch.
>
> I really do not understand why you say that some copies
> should be effectively read-only... You can start to work on some feature
> at one place (using one repo) and then continue in another place using
> another repo. (Obviously, it will require to fetch changes from the
> first repo, before you will be able to continue, but it is just one
> command). In other words, I really do not understand what are you
> talking about here.
Developers, especially more junior ones, work on code, and they (are
supposed to) spend almost all of their intellectual energy on the issues
to do with creating and modifying code -- they are not expected to be
integration engineers, nor are they expected to be VCS and SCM experts.
The more steps you put in place for them to do, and the more places you
allow them to store changes, etc., etc., etc., the more mistakes that
they will make.
Besides, in some scenarios build directories will be checked out from
integration branches which shouldn't have any direct commits made to
them, especially not to fix a problem in a build.
BTW, pkgsrc has well over 50,000 files, FreeBSD ports is over 100,000.
Neither can really be split in any rational way.
--
Greg A. Woods
Planix, Inc.
<woods@planix.com> +1 416 218 0099 http://www.planix.com/
[-- Attachment #2: Type: application/pgp-signature, Size: 186 bytes --]
next prev parent reply other threads:[~2009-12-03 5:11 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-29 3:21 "git merge" merges too much! Greg A. Woods
2009-11-29 5:14 ` Jeff King
2009-11-30 18:12 ` Greg A. Woods
2009-11-30 19:22 ` Dmitry Potapov
2009-12-01 18:52 ` Greg A. Woods
2009-12-01 20:50 ` Dmitry Potapov
2009-12-01 21:58 ` Greg A. Woods
2009-12-02 0:22 ` Dmitry Potapov
2009-12-02 10:20 ` Nanako Shiraishi
2009-12-02 20:09 ` Jeff King
2009-12-03 1:21 ` Git documentation consistency (was: "git merge" merges too much!) Greg A. Woods
2009-12-03 1:34 ` Git documentation consistency Junio C Hamano
2009-12-03 7:22 ` Greg A. Woods
2009-12-03 7:45 ` Jeff King
2009-12-03 15:24 ` Uri Okrent
2009-12-03 16:22 ` Marko Kreen
2009-12-09 19:56 ` Greg A. Woods
2009-12-03 2:07 ` Git documentation consistency (was: "git merge" merges too much!) Jeff King
2009-11-29 5:15 ` "git merge" merges too much! Junio C Hamano
2009-11-30 18:40 ` Greg A. Woods
2009-11-30 20:50 ` Junio C Hamano
2009-11-30 21:17 ` Dmitry Potapov
2009-12-01 0:24 ` Greg A. Woods
2009-12-01 5:47 ` Dmitry Potapov
2009-12-01 17:59 ` multiple working directories for long-running builds (was: "git merge" merges too much!) Greg A. Woods
2009-12-01 18:51 ` Dmitry Potapov
2009-12-01 18:58 ` Greg A. Woods
2009-12-01 21:18 ` Dmitry Potapov
2009-12-01 22:25 ` Jeff Epler
2009-12-01 22:44 ` Greg A. Woods
2009-12-02 0:10 ` Dmitry Potapov
2009-12-03 5:11 ` Greg A. Woods [this message]
2009-12-02 2:09 ` multiple working directories for long-running builds Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1NG3yC-000kmgC@most.weird.com \
--to=woods@planix.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).