From: Dmitry Potapov <dpotapov@gmail.com>
To: The Git Mailing List <git@vger.kernel.org>
Subject: Re: multiple working directories for long-running builds (was: "git merge" merges too much!)
Date: Wed, 2 Dec 2009 03:10:21 +0300 [thread overview]
Message-ID: <20091202001020.GF11235@dpotapov.dyndns.org> (raw)
In-Reply-To: <m1NFbSE-000kn2C@most.weird.com>
On Tue, Dec 01, 2009 at 05:44:15PM -0500, Greg A. Woods wrote:
> At Wed, 2 Dec 2009 00:18:30 +0300, Dmitry Potapov <dpotapov@gmail.com> wrote:
> Subject: Re: multiple working directories for long-running builds (was: "git merge" merges too much!)
> >
> > AFAIK, "git archive" is cheaper than git clone.
>
> It depends on what you mean by "cheaper"
You said:
>>> "git archive" is likely even more
>>> impossible for some large projects to use than "git clone"
My point was that I do not see why you believe "git archive" is more
expensive than "git clone". Accordingly to Jeff Epler's numbers,
"git archive" is 20% faster than "git clone"...
> It's clearly going to require
> less disk space. However it's also clearly going to require more disk
> bandwidth, potentially a _LOT_ more disk bandwidth.
Well, it does, but as I said earlier it is not the case where I expect
things being instantaneous. If you do not a full build and test, then it
is going to take a lot of time anyway, and git-archive is not likely to
be a big overhead in relative terms.
> > I do not say it is fast
> > for huge project, but if you want to run a process such as clean build
> > and test that takes a long time anyway, it does not add much to the
> > total time.
>
> I think you need to try throwing around an archive of, say, 50,000 small
> files a few times simultaneously on your system to appreciate the issue.
>
> (i.e. consider the load on a storage subsystem, say a SAN or NAS, where
> with your suggestion there might be a dozen or more developers running
> "git archive" frequently enough that even three or four might be doing
> it at the same time, and this on top of all the i/o bandwidth required
> for the builds all of the other developers are also running at the same
> time.)
First of all, I do not see why it should be done frequently. The full
build and test may be run once or twice a day, and the full test and
build may take an hour. git-archive will take probably less than minute
for 50,000 small files (especially if you use tmpfs). In other words, it
is 1% overhead, but you get a clean build, which is fully tested. You
can sure that no garbage left in the worktree that could influence the
result.
>
> > > Disk bandwidth is almost always more expensive than disk space.
> >
> > Disk bandwidth is certainly more expensive than disk space, and the
> > whole point was to avoid a lot of disk bandwidth by using hot cache.
>
> Huh? Throwing around the archive has nothing to do with the build
> system in this case.
"git archive" to do full build and test, which is rarely done. Normally,
you just switch between branches, which means a few files are changed
and rebuild, and no archive is involved here.
>
> I'm just not willing to even consider using what would really be the
> most simplistic and most expensive form of updating a working directory
> as could ever be imagined. "Git archive" is truly unintelligent, as-is.
"git archive" is NOT for updating your working tree. You use "git
checkout" to switch between branches. "git checkout" is intelligent
enough to overwrite only those files that actually differ between
two versions.
> A major further advantage of multiple working directories is that this
> eliminates one more point of failure -- i.e. you don't end up with
> multiple copies of the repo that _should_ be effectively read-only for
> everything but "push", and perhaps then only to one branch.
Multiple copies of the same repo is never a problem (except taking some
disks space). I really do not understand why you say that some copies
should be effectively read-only... You can start to work on some feature
at one place (using one repo) and then continue in another place using
another repo. (Obviously, it will require to fetch changes from the
first repo, before you will be able to continue, but it is just one
command). In other words, I really do not understand what are you
talking about here.
>
> I know of at least three very real-world projects where there are tens
> of thousands of small files that really must be managed as one unit, and
> where running a build in that tree could take a whole day or two on even
> the fastest currently available dedicated build server. Eg. pkgsrc.
Tens of thousands files should not be a problem... For instance, the
Linux kernel has around 30 thousands and Git works very well in this
case. But I would consider to split if it has hundrends of thousands...
Dmitry
next prev parent reply other threads:[~2009-12-02 0:11 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-29 3:21 "git merge" merges too much! Greg A. Woods
2009-11-29 5:14 ` Jeff King
2009-11-30 18:12 ` Greg A. Woods
2009-11-30 19:22 ` Dmitry Potapov
2009-12-01 18:52 ` Greg A. Woods
2009-12-01 20:50 ` Dmitry Potapov
2009-12-01 21:58 ` Greg A. Woods
2009-12-02 0:22 ` Dmitry Potapov
2009-12-02 10:20 ` Nanako Shiraishi
2009-12-02 20:09 ` Jeff King
2009-12-03 1:21 ` Git documentation consistency (was: "git merge" merges too much!) Greg A. Woods
2009-12-03 1:34 ` Git documentation consistency Junio C Hamano
2009-12-03 7:22 ` Greg A. Woods
2009-12-03 7:45 ` Jeff King
2009-12-03 15:24 ` Uri Okrent
2009-12-03 16:22 ` Marko Kreen
2009-12-09 19:56 ` Greg A. Woods
2009-12-03 2:07 ` Git documentation consistency (was: "git merge" merges too much!) Jeff King
2009-11-29 5:15 ` "git merge" merges too much! Junio C Hamano
2009-11-30 18:40 ` Greg A. Woods
2009-11-30 20:50 ` Junio C Hamano
2009-11-30 21:17 ` Dmitry Potapov
2009-12-01 0:24 ` Greg A. Woods
2009-12-01 5:47 ` Dmitry Potapov
2009-12-01 17:59 ` multiple working directories for long-running builds (was: "git merge" merges too much!) Greg A. Woods
2009-12-01 18:51 ` Dmitry Potapov
2009-12-01 18:58 ` Greg A. Woods
2009-12-01 21:18 ` Dmitry Potapov
2009-12-01 22:25 ` Jeff Epler
2009-12-01 22:44 ` Greg A. Woods
2009-12-02 0:10 ` Dmitry Potapov [this message]
2009-12-03 5:11 ` Greg A. Woods
2009-12-02 2:09 ` multiple working directories for long-running builds Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091202001020.GF11235@dpotapov.dyndns.org \
--to=dpotapov@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).