* hosting git on a nfs
@ 2008-11-12 9:29 Thomas Koch
2008-11-12 10:10 ` Julian Phillips
2008-11-12 17:36 ` David Brown
0 siblings, 2 replies; 44+ messages in thread
From: Thomas Koch @ 2008-11-12 9:29 UTC (permalink / raw)
To: git; +Cc: dabe
Hi,
finally I managed to convince a critical mass of developers (our chief
dev :-) in our company so that we are starting to migrate to GIT.
The final question is, whether GIT will life peacefully on our cluster
fileservers. The GIT repository dir (/var/cache/git) should be mounted
via NFS via PAN on top of DRBD (so I was told).
Are there any known problems with this setup? We're asking, because
there are problems with SVN on such a setup[1].
[1] http://subversion.tigris.org/faq.html#nfs
Best regards,
--
Thomas Koch, Software Developer
http://www.koch.ro
Young Media Concepts GmbH
Sonnenstr. 4
CH-8280 Kreuzlingen
Switzerland
Tel +41 (0)71 / 508 24 86
Fax +41 (0)71 / 560 53 89
Mobile +49 (0)170 / 753 89 16
Web www.ymc.ch
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-12 9:29 hosting git on a nfs Thomas Koch
@ 2008-11-12 10:10 ` Julian Phillips
2008-11-12 20:31 ` Brandon Casey
2008-11-12 17:36 ` David Brown
1 sibling, 1 reply; 44+ messages in thread
From: Julian Phillips @ 2008-11-12 10:10 UTC (permalink / raw)
To: Thomas Koch; +Cc: git, dabe
On Wed, 12 Nov 2008, Thomas Koch wrote:
> Hi,
>
> finally I managed to convince a critical mass of developers (our chief
> dev :-) in our company so that we are starting to migrate to GIT.
>
> The final question is, whether GIT will life peacefully on our cluster
> fileservers. The GIT repository dir (/var/cache/git) should be mounted
> via NFS via PAN on top of DRBD (so I was told).
>
> Are there any known problems with this setup? We're asking, because
> there are problems with SVN on such a setup[1].
>
> [1] http://subversion.tigris.org/faq.html#nfs
I've been running git on NFS for years (though it's only NFS exported
software RAID), and the only issue I've encountered is that it's not quite
as blisteringly fast as running git on a local disk.
--
Julian
---
There's something different about us -- different from people of Europe,
Africa, Asia ... a deep and abiding belief in the Easter Bunny.
-- G. Gordon Liddy
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-12 9:29 hosting git on a nfs Thomas Koch
2008-11-12 10:10 ` Julian Phillips
@ 2008-11-12 17:36 ` David Brown
2008-11-12 18:14 ` Linus Torvalds
1 sibling, 1 reply; 44+ messages in thread
From: David Brown @ 2008-11-12 17:36 UTC (permalink / raw)
To: Thomas Koch; +Cc: git, dabe
On Wed, Nov 12, 2008 at 10:29:34AM +0100, Thomas Koch wrote:
>finally I managed to convince a critical mass of developers (our chief
>dev :-) in our company so that we are starting to migrate to GIT.
>
>The final question is, whether GIT will life peacefully on our cluster
>fileservers. The GIT repository dir (/var/cache/git) should be mounted
>via NFS via PAN on top of DRBD (so I was told).
>
>Are there any known problems with this setup? We're asking, because
>there are problems with SVN on such a setup[1].
We had occasionally run into locking problems with 1.5.4.x with
renames between different directories. This should be fixed in
1.6.0.3, but we have since migrated to a server model so I don't have
any way of testing this.
None of these problems ever caused repository corruption, only errors
during fetch/clone that were resolved by repeating the operation.
Using ssh: or git: does seem to be a bit faster than NFS. The
configuration we did find completely unworkable was using git with the
work tree on NFS.
David
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-12 17:36 ` David Brown
@ 2008-11-12 18:14 ` Linus Torvalds
2008-11-13 18:18 ` J. Bruce Fields
2008-11-13 18:32 ` James Pickens
0 siblings, 2 replies; 44+ messages in thread
From: Linus Torvalds @ 2008-11-12 18:14 UTC (permalink / raw)
To: David Brown; +Cc: Thomas Koch, git, dabe
On Wed, 12 Nov 2008, David Brown wrote:
>
> We had occasionally run into locking problems with 1.5.4.x with
> renames between different directories. This should be fixed in
> 1.6.0.3, but we have since migrated to a server model so I don't have
> any way of testing this.
I suspect it also depends very much on the particular client/server
combination. Renaming across directories is one of those NFS things that
some servers don't mind at all.
> The configuration we did find completely unworkable was using git with
> the work tree on NFS.
Doing an 'lstat()' on every single file in the tree would tend to do that
to you, yes. Even with a fast network and a good NFS server, we're talking
millisecond-range latencies, and if your tree has tens of thousands of
files, you're going to have each "git diff" take several seconds.
NFS metadata caching can help, but not all clients do it, and even clients
that _do_ do it tend to have rather low timeouts or rather limited cache
sizes, so doing "git diff" twice may speed up the second one only if it's
done really back-to-back - if even then.
And once you get used to "git diff" being instantaneous, I don't think
anybody is ever agan willing to go back to it taking "a few seconds" (and
depending on speed of network/server and size of project, the "few" can be
quite many ;)
So putting the work-tree on NFS certainly _works_, but yes, from a
performance angle it is going to be really irritatingly slower. I don't
even think the newer versions of NFS will help with directory and
attribute caching - the delegations are per-file afaik, and there is no
good support for extending the caching to directories.
That said, I don't think git is any _worse_ than anybody else in the
"worktree on NFS" model. A "git diff" will still be superior ot a CVS diff
in every way. It's just that when people compare to their home machines
where they have the work tree on local disk and aggressively cached, when
they then use a NFS work-tree, they'll likely be very very disappointed.
Linus
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-12 10:10 ` Julian Phillips
@ 2008-11-12 20:31 ` Brandon Casey
0 siblings, 0 replies; 44+ messages in thread
From: Brandon Casey @ 2008-11-12 20:31 UTC (permalink / raw)
To: Julian Phillips; +Cc: Thomas Koch, git, dabe
Julian Phillips wrote:
> On Wed, 12 Nov 2008, Thomas Koch wrote:
>
>> Hi,
>>
>> finally I managed to convince a critical mass of developers (our chief
>> dev :-) in our company so that we are starting to migrate to GIT.
>>
>> The final question is, whether GIT will life peacefully on our cluster
>> fileservers. The GIT repository dir (/var/cache/git) should be mounted
>> via NFS via PAN on top of DRBD (so I was told).
>>
>> Are there any known problems with this setup? We're asking, because
>> there are problems with SVN on such a setup[1].
>>
>> [1] http://subversion.tigris.org/faq.html#nfs
>
> I've been running git on NFS for years (though it's only NFS exported
> software RAID), and the only issue I've encountered is that it's not
> quite as blisteringly fast as running git on a local disk.
ditto. (except for the years part)
-brandon
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-12 18:14 ` Linus Torvalds
@ 2008-11-13 18:18 ` J. Bruce Fields
2008-11-13 18:32 ` James Pickens
1 sibling, 0 replies; 44+ messages in thread
From: J. Bruce Fields @ 2008-11-13 18:18 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Brown, Thomas Koch, git, dabe
On Wed, Nov 12, 2008 at 10:14:44AM -0800, Linus Torvalds wrote:
>
>
> On Wed, 12 Nov 2008, David Brown wrote:
> >
> > We had occasionally run into locking problems with 1.5.4.x with
> > renames between different directories. This should be fixed in
> > 1.6.0.3, but we have since migrated to a server model so I don't have
> > any way of testing this.
>
> I suspect it also depends very much on the particular client/server
> combination. Renaming across directories is one of those NFS things that
> some servers don't mind at all.
On the linux server you want to make sure you're exporting with
no_subtree_check (see "man exports").
> > The configuration we did find completely unworkable was using git with
> > the work tree on NFS.
>
> Doing an 'lstat()' on every single file in the tree would tend to do that
> to you, yes. Even with a fast network and a good NFS server, we're talking
> millisecond-range latencies, and if your tree has tens of thousands of
> files, you're going to have each "git diff" take several seconds.
>
> NFS metadata caching can help, but not all clients do it, and even clients
> that _do_ do it tend to have rather low timeouts or rather limited cache
> sizes, so doing "git diff" twice may speed up the second one only if it's
> done really back-to-back - if even then.
>
> And once you get used to "git diff" being instantaneous, I don't think
> anybody is ever agan willing to go back to it taking "a few seconds" (and
> depending on speed of network/server and size of project, the "few" can be
> quite many ;)
Yep.
> So putting the work-tree on NFS certainly _works_, but yes, from a
> performance angle it is going to be really irritatingly slower. I don't
> even think the newer versions of NFS will help with directory and
> attribute caching - the delegations are per-file afaik, and there is no
> good support for extending the caching to directories.
File delegations do cover a file's attributes, so in theory they could
help. But they're only given out on open. The upcoming 4.1 spec has a
few improvements here, and it might be worth looking at whether they're
sufficient to make this work.
--b.
> That said, I don't think git is any _worse_ than anybody else in the
> "worktree on NFS" model. A "git diff" will still be superior ot a CVS diff
> in every way. It's just that when people compare to their home machines
> where they have the work tree on local disk and aggressively cached, when
> they then use a NFS work-tree, they'll likely be very very disappointed.
>
> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-12 18:14 ` Linus Torvalds
2008-11-13 18:18 ` J. Bruce Fields
@ 2008-11-13 18:32 ` James Pickens
2008-11-13 20:18 ` Linus Torvalds
1 sibling, 1 reply; 44+ messages in thread
From: James Pickens @ 2008-11-13 18:32 UTC (permalink / raw)
To: git
Linus Torvalds <torvalds <at> linux-foundation.org> writes:
> Doing an 'lstat()' on every single file in the tree would tend to do that
> to you, yes. Even with a fast network and a good NFS server, we're talking
> millisecond-range latencies, and if your tree has tens of thousands of
> files, you're going to have each "git diff" take several seconds.
Is there any way to improve 'git status' performance on nfs? I know nothing
about how that code works, but if it's strictly serial, i.e. it waits for the
result of each lstat() before doing the next lstat(), then perhaps it could be
sped up by overlapping the lstat() calls via multi threading.
Reason I ask is that at my work place, using only local disks would be
difficult. We run lots of long running tests in a server farm, and working on
nfs allows the compute servers to access our data transparently.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-13 18:32 ` James Pickens
@ 2008-11-13 20:18 ` Linus Torvalds
2008-11-13 21:05 ` Linus Torvalds
0 siblings, 1 reply; 44+ messages in thread
From: Linus Torvalds @ 2008-11-13 20:18 UTC (permalink / raw)
To: James Pickens; +Cc: git
On Thu, 13 Nov 2008, James Pickens wrote:
>
> Is there any way to improve 'git status' performance on nfs? I know nothing
> about how that code works, but if it's strictly serial, i.e. it waits for the
> result of each lstat() before doing the next lstat(), then perhaps it could be
> sped up by overlapping the lstat() calls via multi threading.
It's fairly doable. "git status" to some degree is actually the worst
case, since it has to do readdir()'s etc to find new files, but I've
occasionally considered trying to do a parallel version of the regular
index file checking where we have the list of files a priori.
That would speed up "git diff" by potentially a big amount (it would also
speed up git status, just not as much as also doing readdirs in parallel).
However, every time I think about it, I end up looking at my own setup
which has effectively no parallelism. Sure, in theory parallel reads can
help even with a local disk, but in practice the potential seek advantage
is very small, and so I've never really had it as a high priority.
If I were still using NFS (I gave up on it years ago exactly because it
was so painful for software development - and that was when I was using
CVS) I'd surely have done it long since.
Bit I'll think about it again.
Linus
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-13 20:18 ` Linus Torvalds
@ 2008-11-13 21:05 ` Linus Torvalds
2008-11-13 23:23 ` James Pickens
` (2 more replies)
0 siblings, 3 replies; 44+ messages in thread
From: Linus Torvalds @ 2008-11-13 21:05 UTC (permalink / raw)
To: James Pickens, Bruce Fields, Junio C Hamano; +Cc: Git Mailing List
On Thu, 13 Nov 2008, Linus Torvalds wrote:
>
> If I were still using NFS (I gave up on it years ago exactly because it
> was so painful for software development - and that was when I was using
> CVS) I'd surely have done it long since.
>
> Bit I'll think about it again.
THIS IS TOTALLY UNTESTED! It's trivial, it's hacky, and it would need to
be cleaned up before being merged, but I'm not even going to bother
_trying_ to make it anything cleaner unless somebody can test it on a nice
NFS setup.
Because it's entirely possible that there are various directory inode
locks etc that means that doing parallel lookups in the same directory
won't actually be doing a whole lot. Whatever. It's kind of cute, and it
really isn't all that many lines of code, and even if it doesn't work due
to some locking reason, maybe it's worth looking at.
It arbitrarily caps the number of threads to 10, and it has no way to turn
it on or off. It actually _does_ seem to work in the sense than when
cached, it can take advantage of the fact that I have a nice multi-core
thing, but let's face it, it's not really worth it for that reason only.
Before:
[torvalds@nehalem linux]$ /usr/bin/time git diff > /dev/null
0.03user 0.04system 0:00.07elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
After:
0.02user 0.07system 0:00.04elapsed 243%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2241minor)pagefaults 0swaps
ie it actually did cut elapsed time from 7 hundredths of a second to just
4. And the CPU usage went from 100% to 243%. Ooooh. Magic.
But it's still hacky as hell. Who has NFS? Can you do the same thing over
NFS and test it? I'm not going to set up NFS to test this, and as I
suspected, on a local disk, the cold-cache case makes no difference
what-so-ever, because whatever seek optimizations can be done are still
totally irrelevant.
And if there are per-directory locking etc that screws this up, we can
look at using different heuristics for the lstat() patterns. Right now I
divvy it up so that thread 0 gets entries 0, 10, 20, 30.. and thread 1
does entries 1, 11, 21, 31.., but we could easily split it up differently,
and do 0,1,2,3.. and 1000,1001,1002,1003.. instead. That migth avoid some
per-directory locks. Dunno.
Anyway, it was kind of fun writing this. The reason it threads so well is
that all the lstat() code really works on private data already, so there
are no global data structures that change that need to be worried about.
So no locking necessary - just fire it up in parallel, and wait for the
results. We already had everything else in place (ie the per-cache_entry
flag to say "I've checked this entry on disk").
Of course, if a lot of entries do _not_ match, then this will actually
generate more work (because we'll do the parallel thing to verify that
the on-disk version matches, and if it doesn't match, then we'll end up
re-doing it linearly later more carefully).
Linus
---
diff-lib.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 67 insertions(+), 0 deletions(-)
diff --git a/diff-lib.c b/diff-lib.c
index ae96c64..7d972c9 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -1,6 +1,7 @@
/*
* Copyright (C) 2005 Junio C Hamano
*/
+#include <pthread.h>
#include "cache.h"
#include "quote.h"
#include "commit.h"
@@ -54,6 +55,70 @@ static int check_removed(const struct cache_entry *ce, struct stat *st)
return 0;
}
+/* Hacky hack-hack start */
+#define MAX_PARALLEL (10)
+static int parallel_lstats = 1;
+
+struct thread_data {
+ pthread_t pthread;
+ struct index_state *index;
+ struct rev_info *revs;
+ int i, step;
+};
+
+static void *preload_thread(void *_data)
+{
+ int i;
+ struct thread_data *p = _data;
+ struct index_state *index = p->index;
+ struct rev_info *revs = p->revs;
+
+ for (i = p->i; i < index->cache_nr; i += p->step) {
+ struct cache_entry *ce = index->cache[i];
+ struct stat st;
+
+ if (ce_stage(ce))
+ continue;
+ if (ce_uptodate(ce))
+ continue;
+ if (!ce_path_match(ce, revs->prune_data))
+ continue;
+ if (lstat(ce->name, &st))
+ continue;
+ if (ie_match_stat(index, ce, &st, 0))
+ continue;
+ ce_mark_uptodate(ce);
+ }
+ return NULL;
+}
+
+static void preload_uptodate(struct rev_info *revs, struct index_state *index)
+{
+ int i;
+ int threads = index->cache_nr / 100;
+ struct thread_data data[MAX_PARALLEL];
+
+ if (threads < 2)
+ return;
+ if (threads > MAX_PARALLEL)
+ threads = MAX_PARALLEL;
+ for (i = 0; i < threads; i++) {
+ struct thread_data *p = data+i;
+ p->index = index;
+ p->revs = revs;
+ p->i = i;
+ p->step = threads;
+ if (pthread_create(&p->pthread, NULL, preload_thread, p))
+ die("unable to create threaded lstat");
+ }
+ for (i = 0; i < threads; i++) {
+ struct thread_data *p = data+i;
+ if (pthread_join(p->pthread, NULL))
+ die("unable to join threaded lstat");
+ }
+}
+/* Hacky hack-hack mostly ends */
+
int run_diff_files(struct rev_info *revs, unsigned int option)
{
int entries, i;
@@ -68,6 +133,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
if (diff_unmerged_stage < 0)
diff_unmerged_stage = 2;
entries = active_nr;
+ if (parallel_lstats)
+ preload_uptodate(revs, &the_index);
symcache[0] = '\0';
for (i = 0; i < entries; i++) {
struct stat st;
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-13 21:05 ` Linus Torvalds
@ 2008-11-13 23:23 ` James Pickens
2008-11-13 23:48 ` Linus Torvalds
2008-11-13 23:23 ` Julian Phillips
2008-11-13 23:42 ` Linus Torvalds
2 siblings, 1 reply; 44+ messages in thread
From: James Pickens @ 2008-11-13 23:23 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Bruce Fields, Junio C Hamano, Git Mailing List
On Thu, Nov 13, 2008, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> But it's still hacky as hell. Who has NFS? Can you do the same thing over
> NFS and test it? I'm not going to set up NFS to test this, and as I
> suspected, on a local disk, the cold-cache case makes no difference
> what-so-ever, because whatever seek optimizations can be done are still
> totally irrelevant.
I'm trying to test this and so far I'm not seeing any effect. The
timings are all over the map, but the average time is about the same as
before. If I run 'git status' over and over without pausing in between,
it runs very fast, so I guess the client is caching some things for a
short time. Here are the timings I got by running git 1.6.0.4 and
1.6.0.4 plus this patch, alternating between them and sleeping for 60
seconds after each run:
Git 1.6.0.4:
4.83
4.47
4.54
9.04
5.28
4.33
6.40
13.71
4.51
5.90
Git 1.6.0.4 plus patch:
7.82
10.94
4.61
5.06
5.59
5.22
4.65
5.58
6.70
9.15
I'll play around with the code and see if I figure anything out.
In case it matters, I'm using a 4 core machine with a fairly old kernel:
$ uname -r
2.6.5-7.287.3.PTF.363939.1-smp
BTW thanks a lot for working on it. I was expecting a response along
the lines of "it's possible but would require extensive code changes so
it's not likely to happen", and instead a patch was posted only a few
hours later. Maybe I should ask for all those other upgrades I've had
in mind... ;-)
James
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-13 21:05 ` Linus Torvalds
2008-11-13 23:23 ` James Pickens
@ 2008-11-13 23:23 ` Julian Phillips
2008-11-13 23:42 ` Linus Torvalds
2 siblings, 0 replies; 44+ messages in thread
From: Julian Phillips @ 2008-11-13 23:23 UTC (permalink / raw)
To: Linus Torvalds
Cc: James Pickens, Bruce Fields, Junio C Hamano, Git Mailing List
On Thu, 13 Nov 2008, Linus Torvalds wrote:
> Before:
>
> [torvalds@nehalem linux]$ /usr/bin/time git diff > /dev/null
> 0.03user 0.04system 0:00.07elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
>
> After:
>
> 0.02user 0.07system 0:00.04elapsed 243%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+2241minor)pagefaults 0swaps
>
> ie it actually did cut elapsed time from 7 hundredths of a second to just
> 4. And the CPU usage went from 100% to 243%. Ooooh. Magic.
>
> But it's still hacky as hell. Who has NFS? Can you do the same thing over
> NFS and test it? I'm not going to set up NFS to test this, and as I
> suspected, on a local disk, the cold-cache case makes no difference
> what-so-ever, because whatever seek optimizations can be done are still
> totally irrelevant.
The timings seem to vary quite a bit (not really a surprise with a network
involved ;), but the patch definately makes things faster:
master:
jp3@kaos: linux-2.6(master)>/usr/bin/time ~/bin/git diff > /dev/null
0.01user 0.19system 0:02.50elapsed 8%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1766minor)pagefaults 0swaps
master + patch:
jp3@kaos: linux-2.6(master)>/usr/bin/time ~/bin/git diff > /dev/null
0.02user 0.88system 0:00.96elapsed 93%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+1783minor)pagefaults 0swaps
seems to be approximately twice as fast?
--
Julian
---
<nelchael> "XML is like violence, if it doesn't solve the problem, just
use more."
* nelchael hides
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-13 21:05 ` Linus Torvalds
2008-11-13 23:23 ` James Pickens
2008-11-13 23:23 ` Julian Phillips
@ 2008-11-13 23:42 ` Linus Torvalds
2008-11-14 0:04 ` Julian Phillips
2008-11-14 0:14 ` Brandon Casey
2 siblings, 2 replies; 44+ messages in thread
From: Linus Torvalds @ 2008-11-13 23:42 UTC (permalink / raw)
To: James Pickens, Bruce Fields, Junio C Hamano; +Cc: Git Mailing List
On Thu, 13 Nov 2008, Linus Torvalds wrote:
>
> And if there are per-directory locking etc that screws this up, we can
> look at using different heuristics for the lstat() patterns. Right now I
> divvy it up so that thread 0 gets entries 0, 10, 20, 30.. and thread 1
> does entries 1, 11, 21, 31.., but we could easily split it up differently,
> and do 0,1,2,3.. and 1000,1001,1002,1003.. instead. That migth avoid some
> per-directory locks. Dunno.
Yeah, I just checked. We really should to the lookups in clusters, because
yes, we do have directory locks that limit parallel lookups in the same
directory, so to get good parallelism you want to look up in different
places.
Admittedly that's really a Linux kernel deficiency, and I'd love to fix it
(in the long run), but we optimize file lookup for the common (cached)
case, and the uncommon case of parallel misses in the same directory has
been written for simplicity and robustness.
So don't even bother trying the previous patch, even if it might work. At
least not on Linux. Try this one instead, which does the parallel things
in batches, so that we hopefully hit different directories.
Parallelism is hard.
The good news is that I seem to actualluy see a bit of a win from this
even on a disk, now that the kernel doesn't serialize things. So it may
be worth it. So I have some hope that it actually helps on NFS too.
The numbers for five runs (with clearing of the caches in between, of
course) are:
Before:
0.01user 0.23system 0:10.87elapsed 2%CPU
0.04user 0.19system 0:10.86elapsed 2%CPU
0.03user 0.26system 0:10.82elapsed 2%CPU
0.02user 0.27system 0:12.67elapsed 2%CPU
0.01user 0.22system 0:10.86elapsed 2%CPU
After:
0.03user 0.26system 0:07.88elapsed 3%CPU
0.02user 0.25system 0:07.63elapsed 3%CPU
0.01user 0.26system 0:08.62elapsed 3%CPU
0.01user 0.26system 0:07.27elapsed 3%CPU
0.05user 0.28system 0:08.61elapsed 3%CPU
so it really does seem like it has possibly given a 30% improvement in
cold-cache performance even on a disk.
NOTE NOTE NOTE! This is still a total hack. If this was done properly, we
would do the "preload_uptodate()" thing when we load the cache for all the
different programs where it can matter, not just the one special case
place. So see this as a proof-of-concept, nothing more.
Linus
---
diff-lib.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 74 insertions(+), 0 deletions(-)
diff --git a/diff-lib.c b/diff-lib.c
index ae96c64..83c180d 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -1,6 +1,7 @@
/*
* Copyright (C) 2005 Junio C Hamano
*/
+#include <pthread.h>
#include "cache.h"
#include "quote.h"
#include "commit.h"
@@ -54,6 +55,77 @@ static int check_removed(const struct cache_entry *ce, struct stat *st)
return 0;
}
+/* Hacky hack-hack start */
+#define MAX_PARALLEL (10)
+static int parallel_lstats = 1;
+
+struct thread_data {
+ pthread_t pthread;
+ struct index_state *index;
+ const char **pathspec;
+ int offset, nr;
+};
+
+static void *preload_thread(void *_data)
+{
+ int nr;
+ struct thread_data *p = _data;
+ struct index_state *index = p->index;
+ struct cache_entry **cep = index->cache + p->offset;
+
+ nr = p->nr;
+ if (nr + p->offset > index->cache_nr)
+ nr = index->cache_nr - p->offset;
+
+ do {
+ struct cache_entry *ce = *cep++;
+ struct stat st;
+
+ if (ce_stage(ce))
+ continue;
+ if (ce_uptodate(ce))
+ continue;
+ if (!ce_path_match(ce, p->pathspec))
+ continue;
+ if (lstat(ce->name, &st))
+ continue;
+ if (ie_match_stat(index, ce, &st, 0))
+ continue;
+ ce_mark_uptodate(ce);
+ } while (--nr > 0);
+ return NULL;
+}
+
+static void preload_uptodate(const char **pathspec, struct index_state *index)
+{
+ int i, work, offset;
+ int threads = index->cache_nr / 100;
+ struct thread_data data[MAX_PARALLEL];
+
+ if (threads < 2)
+ return;
+ if (threads > MAX_PARALLEL)
+ threads = MAX_PARALLEL;
+ offset = 0;
+ work = (index->cache_nr + threads - 1) / threads;
+ for (i = 0; i < threads; i++) {
+ struct thread_data *p = data+i;
+ p->index = index;
+ p->pathspec = pathspec;
+ p->offset = offset;
+ p->nr = work;
+ offset += work;
+ if (pthread_create(&p->pthread, NULL, preload_thread, p))
+ die("unable to create threaded lstat");
+ }
+ for (i = 0; i < threads; i++) {
+ struct thread_data *p = data+i;
+ if (pthread_join(p->pthread, NULL))
+ die("unable to join threaded lstat");
+ }
+}
+/* Hacky hack-hack mostly ends */
+
int run_diff_files(struct rev_info *revs, unsigned int option)
{
int entries, i;
@@ -68,6 +140,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
if (diff_unmerged_stage < 0)
diff_unmerged_stage = 2;
entries = active_nr;
+ if (parallel_lstats)
+ preload_uptodate(revs->prune_data, &the_index);
symcache[0] = '\0';
for (i = 0; i < entries; i++) {
struct stat st;
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-13 23:23 ` James Pickens
@ 2008-11-13 23:48 ` Linus Torvalds
0 siblings, 0 replies; 44+ messages in thread
From: Linus Torvalds @ 2008-11-13 23:48 UTC (permalink / raw)
To: James Pickens; +Cc: Bruce Fields, Junio C Hamano, Git Mailing List
On Thu, 13 Nov 2008, James Pickens wrote:
>
> I'm trying to test this and so far I'm not seeing any effect.
Ok, try my second version. The first version would get _some_ parallelism,
but very little due to all the threads often trying to just look up in the
same directory. The second version should hopefully have less of that
effect.
Also, under Linux, try it with caches cleared in between runs, so that you
can avoid any issues of the Linux client caching the directory lookup data
(which linux _will_ do - I think the default timeout is 30 seconds or
something like that):
echo 3 > /proc/sys/vm/drop_caches ; time git diff
which should hopefully get you more reliable timings.
Linus
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-13 23:42 ` Linus Torvalds
@ 2008-11-14 0:04 ` Julian Phillips
2008-11-14 0:14 ` Brandon Casey
1 sibling, 0 replies; 44+ messages in thread
From: Julian Phillips @ 2008-11-14 0:04 UTC (permalink / raw)
To: Linus Torvalds
Cc: James Pickens, Bruce Fields, Junio C Hamano, Git Mailing List
On Thu, 13 Nov 2008, Linus Torvalds wrote:
> The good news is that I seem to actualluy see a bit of a win from this
> even on a disk, now that the kernel doesn't serialize things. So it may
> be worth it. So I have some hope that it actually helps on NFS too.
> The numbers for five runs (with clearing of the caches in between, of
> course) are:
>
> Before:
>
> 0.01user 0.23system 0:10.87elapsed 2%CPU
> 0.04user 0.19system 0:10.86elapsed 2%CPU
> 0.03user 0.26system 0:10.82elapsed 2%CPU
> 0.02user 0.27system 0:12.67elapsed 2%CPU
> 0.01user 0.22system 0:10.86elapsed 2%CPU
>
> After:
>
> 0.03user 0.26system 0:07.88elapsed 3%CPU
> 0.02user 0.25system 0:07.63elapsed 3%CPU
> 0.01user 0.26system 0:08.62elapsed 3%CPU
> 0.01user 0.26system 0:07.27elapsed 3%CPU
> 0.05user 0.28system 0:08.61elapsed 3%CPU
>
> so it really does seem like it has possibly given a 30% improvement in
> cold-cache performance even on a disk.
On an NFS kernel checkout I get the following elapsed times:
master:
0:02.78
0:02.70
0:02.43
0:02.28
0:02.71
0:02.80
0:02.60
0:02.06
0:02.00
master + new patch:
0:00.77
0:00.83
0:01.02
0:00.77
0:00.91
0:00.78
0:00.78
0:01.09
0:01.00
--
Julian
---
There are no accidents whatsoever in the universe.
-- Baba Ram Dass
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-13 23:42 ` Linus Torvalds
2008-11-14 0:04 ` Julian Phillips
@ 2008-11-14 0:14 ` Brandon Casey
2008-11-14 0:38 ` Linus Torvalds
1 sibling, 1 reply; 44+ messages in thread
From: Brandon Casey @ 2008-11-14 0:14 UTC (permalink / raw)
To: Linus Torvalds
Cc: James Pickens, Bruce Fields, Junio C Hamano, Git Mailing List
Second version.
Hot cache (over NFS):
Before:
0.01user 0.11system 0:00.13elapsed 95%CPU
0.01user 0.11system 0:00.13elapsed 96%CPU
0.01user 0.12system 0:00.13elapsed 97%CPU
After:
0.01user 0.09system 0:00.11elapsed 95%CPU
0.01user 0.09system 0:00.11elapsed 95%CPU
0.01user 0.09system 0:00.11elapsed 95%CPU
Cold cache* (over NFS):
Before:
0.01user 0.31system 0:04.40elapsed 7%CPU
0.01user 0.31system 0:06.47elapsed 5%CPU
0.01user 0.26system 0:04.19elapsed 6%CPU
0.01user 0.30system 0:04.99elapsed 6%CPU
After:
0.01user 0.46system 0:01.39elapsed 34%CPU
0.01user 0.41system 0:00.88elapsed 47%CPU
0.01user 0.45system 0:01.16elapsed 40%CPU
0.01user 0.45system 0:00.99elapsed 47%CPU
0.01user 0.45system 0:01.02elapsed 45%CPU
* Note: I can't do 'echo 3 > /proc/sys/vm/drop_caches', so cold
cache means 'sleep 60' between git diff (30 wasn't long enough).
-brandon
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-14 0:14 ` Brandon Casey
@ 2008-11-14 0:38 ` Linus Torvalds
2008-11-14 0:59 ` Pieter de Bie
2008-11-14 1:15 ` Linus Torvalds
0 siblings, 2 replies; 44+ messages in thread
From: Linus Torvalds @ 2008-11-14 0:38 UTC (permalink / raw)
To: Brandon Casey
Cc: James Pickens, Bruce Fields, Junio C Hamano, Git Mailing List
On Thu, 13 Nov 2008, Brandon Casey wrote:
>
> Cold cache* (over NFS):
>
> Before:
>
> 0.01user 0.31system 0:04.40elapsed 7%CPU
> 0.01user 0.31system 0:06.47elapsed 5%CPU
> 0.01user 0.26system 0:04.19elapsed 6%CPU
> 0.01user 0.30system 0:04.99elapsed 6%CPU
>
> After:
>
> 0.01user 0.46system 0:01.39elapsed 34%CPU
> 0.01user 0.41system 0:00.88elapsed 47%CPU
> 0.01user 0.45system 0:01.16elapsed 40%CPU
> 0.01user 0.45system 0:00.99elapsed 47%CPU
> 0.01user 0.45system 0:01.02elapsed 45%CPU
Ok, both you and Julian do seem to be getting a nice speedup from this.
I'll clean it up a bit and make a less hacky version. And I'll try to make
it work for "git status" and friends too.
Linus
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-14 0:38 ` Linus Torvalds
@ 2008-11-14 0:59 ` Pieter de Bie
2008-11-14 1:15 ` Linus Torvalds
1 sibling, 0 replies; 44+ messages in thread
From: Pieter de Bie @ 2008-11-14 0:59 UTC (permalink / raw)
To: Linus Torvalds
Cc: Brandon Casey, James Pickens, Bruce Fields, Junio C Hamano,
Git Mailing List
On Nov 14, 2008, at 1:38 AM, Linus Torvalds wrote:
> Ok, both you and Julian do seem to be getting a nice speedup from
> this.
>
> I'll clean it up a bit and make a less hacky version. And I'll try
> to make
> it work for "git status" and friends too.
I have two more datapoints.
The first is OS X 10.5 on a local HFS repository. git.git goes down
from ~70 to ~63ms and mozilla's repository (30000) files goes from
~350ms to ~230ms (yeah, HFS sucks).
Using AFP for the same repositories, the first goes down from ~80 to
~67ms, the second from ~70 seconds to ~50 seconds, which is a nice
speedup :)
- Pieter
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-14 0:38 ` Linus Torvalds
2008-11-14 0:59 ` Pieter de Bie
@ 2008-11-14 1:15 ` Linus Torvalds
2008-11-14 3:33 ` James Pickens
` (3 more replies)
1 sibling, 4 replies; 44+ messages in thread
From: Linus Torvalds @ 2008-11-14 1:15 UTC (permalink / raw)
To: Brandon Casey
Cc: James Pickens, Bruce Fields, Junio C Hamano, Git Mailing List
On Thu, 13 Nov 2008, Linus Torvalds wrote:
>
> I'll clean it up a bit and make a less hacky version. And I'll try to make
> it work for "git status" and friends too.
Ok, this is a no-longer-totally-hacky thing, which also adds support for
doing the same for "git status". I haven't actually done any timings, but
the preload algorithm is all the same. The interface is just a much more
natural one.
NOTE NOTE NOTE! It may not be totally hacky, but it's still not
"finished". There's a few more things to look at:
- maybe there are other users of "read_cache()" that want to do the
preloading. I just did "git diff" and "git status" (where the stuff
that "git commit" does falls out of the status changes)
- I do think the thing should be more configurable. The "ten threads"
approach makes no sense for people who have single-core CPU's and tend
to have their trees cached - it will just slow things down for that
case. So there should probably be some config option to turn this on
and off.
- It would be really cool to find some way to automatically notice when
the tree is hot-cached and we might as well just be linear. I don't
know exactly what it might be, but one of the nice things is that the
preloading is _entirely_ optimistic, and we could just stop it in the
middle if we notice that there's no upside.
- Somebody else should take a look at it in general. It may work, but
maybe there's something stupid I'm doing, or maybe somebody else can
come up with a clever thing.
So I think this is probably my last version for now, and let's see if
others can find improvements. I'm not ashamed of it any more. There's
still a "Hacky hack start" comment, it's about the whole config thing.
Linus
---
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu, 13 Nov 2008 16:36:30 -0800
Subject: [PATCH] Add cache preload facility
This can do the lstat() storm in parallel, giving potentially much
improved performance for cold-cache cases or things like NFS that have
weak metadata caching.
Just use "read_cache_preload()" instead of "read_cache()" to force an
optimistic preload of the index stat data. The function takes a
pathspec as its argument, allowing us to preload only the relevant
portion of the index.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
Makefile | 1 +
builtin-commit.c | 8 ++--
builtin-diff-files.c | 4 +-
builtin-diff.c | 8 ++--
cache.h | 2 +
preload-index.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 96 insertions(+), 10 deletions(-)
diff --git a/Makefile b/Makefile
index 920cb42..8387005 100644
--- a/Makefile
+++ b/Makefile
@@ -495,6 +495,7 @@ LIB_OBJS += write_or_die.o
LIB_OBJS += ws.o
LIB_OBJS += wt-status.o
LIB_OBJS += xdiff-interface.o
+LIB_OBJS += preload-index.o
BUILTIN_OBJS += builtin-add.o
BUILTIN_OBJS += builtin-annotate.o
diff --git a/builtin-commit.c b/builtin-commit.c
index 93ca496..90b976e 100644
--- a/builtin-commit.c
+++ b/builtin-commit.c
@@ -225,18 +225,18 @@ static char *prepare_index(int argc, const char **argv, const char *prefix)
if (interactive) {
interactive_add(argc, argv, prefix);
- if (read_cache() < 0)
+ if (read_cache_preload(NULL) < 0)
die("index file corrupt");
commit_style = COMMIT_AS_IS;
return get_index_file();
}
- if (read_cache() < 0)
- die("index file corrupt");
-
if (*argv)
pathspec = get_pathspec(prefix, argv);
+ if (read_cache_preload(pathspec) < 0)
+ die("index file corrupt");
+
/*
* Non partial, non as-is commit.
*
diff --git a/builtin-diff-files.c b/builtin-diff-files.c
index 2b578c7..5b64011 100644
--- a/builtin-diff-files.c
+++ b/builtin-diff-files.c
@@ -59,8 +59,8 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix)
(rev.diffopt.output_format & DIFF_FORMAT_PATCH))
rev.combine_merges = rev.dense_combined_merges = 1;
- if (read_cache() < 0) {
- perror("read_cache");
+ if (read_cache_preload(rev.diffopt.paths) < 0) {
+ perror("read_cache_preload");
return -1;
}
result = run_diff_files(&rev, options);
diff --git a/builtin-diff.c b/builtin-diff.c
index 82d4dda..b9a2b37 100644
--- a/builtin-diff.c
+++ b/builtin-diff.c
@@ -134,8 +134,8 @@ static int builtin_diff_index(struct rev_info *revs,
revs->max_count != -1 || revs->min_age != -1 ||
revs->max_age != -1)
usage(builtin_diff_usage);
- if (read_cache() < 0) {
- perror("read_cache");
+ if (read_cache_preload(revs->diffopt.paths) < 0) {
+ perror("read_cache_preload");
return -1;
}
return run_diff_index(revs, cached);
@@ -234,8 +234,8 @@ static int builtin_diff_files(struct rev_info *revs, int argc, const char **argv
revs->combine_merges = revs->dense_combined_merges = 1;
setup_work_tree();
- if (read_cache() < 0) {
- perror("read_cache");
+ if (read_cache_preload(revs->diffopt.paths) < 0) {
+ perror("read_cache_preload");
return -1;
}
result = run_diff_files(revs, options);
diff --git a/cache.h b/cache.h
index 6be60ea..c7b69e8 100644
--- a/cache.h
+++ b/cache.h
@@ -262,6 +262,7 @@ static inline void remove_name_hash(struct cache_entry *ce)
#define read_cache() read_index(&the_index)
#define read_cache_from(path) read_index_from(&the_index, (path))
+#define read_cache_preload(pathspec) read_index_preload(&the_index, (pathspec))
#define is_cache_unborn() is_index_unborn(&the_index)
#define read_cache_unmerged() read_index_unmerged(&the_index)
#define write_cache(newfd, cache, entries) write_index(&the_index, (newfd))
@@ -368,6 +369,7 @@ extern int init_db(const char *template_dir, unsigned int flags);
/* Initialize and use the cache information */
extern int read_index(struct index_state *);
+extern int read_index_preload(struct index_state *, const char **pathspec);
extern int read_index_from(struct index_state *, const char *path);
extern int is_index_unborn(struct index_state *);
extern int read_index_unmerged(struct index_state *);
diff --git a/preload-index.c b/preload-index.c
new file mode 100644
index 0000000..e322c27
--- /dev/null
+++ b/preload-index.c
@@ -0,0 +1,83 @@
+/*
+ * Copyright (C) 2008 Linus Torvalds
+ */
+#include <pthread.h>
+#include "cache.h"
+
+/* Hacky hack-hack start */
+#define MAX_PARALLEL (100)
+int parallel_lstats = 1;
+
+struct thread_data {
+ pthread_t pthread;
+ struct index_state *index;
+ const char **pathspec;
+ int offset, nr;
+};
+
+static void *preload_thread(void *_data)
+{
+ int nr;
+ struct thread_data *p = _data;
+ struct index_state *index = p->index;
+ struct cache_entry **cep = index->cache + p->offset;
+
+ nr = p->nr;
+ if (nr + p->offset > index->cache_nr)
+ nr = index->cache_nr - p->offset;
+
+ do {
+ struct cache_entry *ce = *cep++;
+ struct stat st;
+
+ if (ce_stage(ce))
+ continue;
+ if (ce_uptodate(ce))
+ continue;
+ if (!ce_path_match(ce, p->pathspec))
+ continue;
+ if (lstat(ce->name, &st))
+ continue;
+ if (ie_match_stat(index, ce, &st, 0))
+ continue;
+ ce_mark_uptodate(ce);
+ } while (--nr > 0);
+ return NULL;
+}
+
+static void preload_index(struct index_state *index, const char **pathspec)
+{
+ int i, work, offset;
+ int threads = index->cache_nr / 100;
+ struct thread_data data[MAX_PARALLEL];
+
+ if (threads < 2)
+ return;
+ if (threads > MAX_PARALLEL)
+ threads = MAX_PARALLEL;
+ offset = 0;
+ work = (index->cache_nr + threads - 1) / threads;
+ for (i = 0; i < threads; i++) {
+ struct thread_data *p = data+i;
+ p->index = index;
+ p->pathspec = pathspec;
+ p->offset = offset;
+ p->nr = work;
+ offset += work;
+ if (pthread_create(&p->pthread, NULL, preload_thread, p))
+ die("unable to create threaded lstat");
+ }
+ for (i = 0; i < threads; i++) {
+ struct thread_data *p = data+i;
+ if (pthread_join(p->pthread, NULL))
+ die("unable to join threaded lstat");
+ }
+}
+
+int read_index_preload(struct index_state *index, const char **pathspec)
+{
+ int retval = read_index(index);
+
+ preload_index(index, pathspec);
+ return retval;
+}
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-14 1:15 ` Linus Torvalds
@ 2008-11-14 3:33 ` James Pickens
2008-11-14 5:01 ` Linus Torvalds
2008-11-14 13:01 ` Michael J Gruber
` (2 subsequent siblings)
3 siblings, 1 reply; 44+ messages in thread
From: James Pickens @ 2008-11-14 3:33 UTC (permalink / raw)
To: Linus Torvalds
Cc: Brandon Casey, Bruce Fields, Junio C Hamano, Git Mailing List
On Thu, Nov 13, 2008 at 6:15 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> - maybe there are other users of "read_cache()" that want to do the
> preloading. I just did "git diff" and "git status" (where the stuff
> that "git commit" does falls out of the status changes)
I wonder if there are other completely different parts of git that could
benefit from multi threading when the work tree is on nfs? I'm thinking
specifically of 'git checkout', since while testing this patch I
happened to do a 'git pull' that resulted in several thousand new files
being created, and the "Checking out files" part took *forever* to run.
And FWIW, I timed 50 iterations of 'git diff', and the average runtime
dropped from 11.7s to 2.8s after this patch. A nice improvement.
James
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-14 3:33 ` James Pickens
@ 2008-11-14 5:01 ` Linus Torvalds
0 siblings, 0 replies; 44+ messages in thread
From: Linus Torvalds @ 2008-11-14 5:01 UTC (permalink / raw)
To: James Pickens
Cc: Brandon Casey, Bruce Fields, Junio C Hamano, Git Mailing List
On Thu, 13 Nov 2008, James Pickens wrote:
>
> I wonder if there are other completely different parts of git that could
> benefit from multi threading when the work tree is on nfs?
I'm sure there are. That said, threading things is usually really quite
painful. The only reason this preloading was easy to do was that we really
had all the data structures laid out beautifully for this, and I had spent
a lot of effort earlier on a whole series of "avoid duplicate lstat()"
changes, which gave us that whole ce_uptodate() thing, and all normal
cases already taking advantage of it, and the "uptodate" bit being
percolated along all the paths.
If it hadn't been for that, it would have been much nastier to do.
As it was, there was literally just a simple little extra phase to fill in
all teh data structures that we already had set up in parallel.
> I'm thinking specifically of 'git checkout', since while testing this
> patch I happened to do a 'git pull' that resulted in several thousand
> new files being created, and the "Checking out files" part took
> *forever* to run.
Now, the good news is that the actual work-tree part of checking things
out is probably pretty amenable to the same kind of parallelization, for
largely the same reasons: the whole checking out thing is already done in
multiple phases with all error handling done before-hand. So we will have
built up all our data structures earlier, and set the CE_UPDATE bit, and
then there's just a final "push it all out" phase.
So CE_UPTODATE and CE_UPDATE are really very similar in that sense -
except at opposite ends of the pipeline. The CE_UPTODATE bit marks a name
entry as matching the filesystem data (and allows all later phases to
avoid doing the expensive lstat()s), while the CE_UPDATE (and CE_REMOVE)
bits allow us to do all our complex work in-memory without committing it
to disk, and then we push it out in one go.
So if you want to multi-thread checkout, you literally need to just thread
the last for-loop in unpack-trees.c:check_updates() (the CE_UPDATE loop
that does "checkout_entry()" over the whole index).
> And FWIW, I timed 50 iterations of 'git diff', and the average runtime
> dropped from 11.7s to 2.8s after this patch. A nice improvement.
Very impressive. That said, I suspect you get a "superlinear" improvement
because once it gets faster, the kernel cache also works better, since you
can do more loops without having the NFS attributes time out.
Whether that kind of effect happens much in actual practice is debatable,
although it's quite possible that it will work the same way in some
scripting schenarios.
Linus
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-14 1:15 ` Linus Torvalds
2008-11-14 3:33 ` James Pickens
@ 2008-11-14 13:01 ` Michael J Gruber
2008-11-14 14:31 ` Kyle Moffett
2008-11-14 18:32 ` Brandon Casey
3 siblings, 0 replies; 44+ messages in thread
From: Michael J Gruber @ 2008-11-14 13:01 UTC (permalink / raw)
To: Linus Torvalds
Cc: Brandon Casey, James Pickens, Bruce Fields, Junio C Hamano,
Git Mailing List
Linus Torvalds venit, vidit, dixit 14.11.2008 02:15:
>
> On Thu, 13 Nov 2008, Linus Torvalds wrote:
>> I'll clean it up a bit and make a less hacky version. And I'll try to make
>> it work for "git status" and friends too.
>
> Ok, this is a no-longer-totally-hacky thing, which also adds support for
> doing the same for "git status". I haven't actually done any timings, but
> the preload algorithm is all the same. The interface is just a much more
> natural one.
>
> NOTE NOTE NOTE! It may not be totally hacky, but it's still not
> "finished". There's a few more things to look at:
Doing nicely already. Over here, cold cache (i.e. sleep 70) "git status"
goes from 0.8s to around 0.4s. Nice. Hot cache goes from 0.18s to 0.23s.
This is really worthwhile if you hack for more than 60s between
diffs/stats ;)
This is already with the server doing some caching (first status was 8s
or so; hardly reproducible), which I can't control.
Michael
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-14 1:15 ` Linus Torvalds
2008-11-14 3:33 ` James Pickens
2008-11-14 13:01 ` Michael J Gruber
@ 2008-11-14 14:31 ` Kyle Moffett
2008-11-14 18:32 ` Brandon Casey
3 siblings, 0 replies; 44+ messages in thread
From: Kyle Moffett @ 2008-11-14 14:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: Brandon Casey, James Pickens, Bruce Fields, Junio C Hamano,
Git Mailing List
On Thu, Nov 13, 2008 at 8:15 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> - It would be really cool to find some way to automatically notice when
> the tree is hot-cached and we might as well just be linear. I don't
> know exactly what it might be, but one of the nice things is that the
> preloading is _entirely_ optimistic, and we could just stop it in the
> middle if we notice that there's no upside.
Perhaps, rather... notice when the tree is *cold*-cached. At that
point you're already in a pseudo-slowpath and starting several threads
won't be noticeable at all compared to the time savings. You could
probably very easily do that by keeping track of the time as you start
with the normal single-threaded scan of directory entries. If the
entries are all equally slow/fast, there most likely won't be any
benefit at all to spawning extra threads. You might use some cutoff
on number of entries to test before you assume everything is hot and
stop worrying about the time. If there's a lot of variability in the
first few thousand then it means some caches are cold and spawning
extra threads will probably help.
Cheers,
Kyle Moffett
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-14 1:15 ` Linus Torvalds
` (2 preceding siblings ...)
2008-11-14 14:31 ` Kyle Moffett
@ 2008-11-14 18:32 ` Brandon Casey
2008-11-14 19:23 ` Linus Torvalds
3 siblings, 1 reply; 44+ messages in thread
From: Brandon Casey @ 2008-11-14 18:32 UTC (permalink / raw)
To: Linus Torvalds
Cc: James Pickens, Bruce Fields, Junio C Hamano, Git Mailing List
Linus Torvalds wrote:
>
> On Thu, 13 Nov 2008, Linus Torvalds wrote:
>> I'll clean it up a bit and make a less hacky version. And I'll try to make
>> it work for "git status" and friends too.
>
> Ok, this is a no-longer-totally-hacky thing, which also adds support for
> doing the same for "git status". I haven't actually done any timings, but
> the preload algorithm is all the same. The interface is just a much more
> natural one.
git status > /dev/null
Before:
0.06user 0.37system 0:03.04elapsed 14%CPU
0.07user 0.36system 0:03.25elapsed 13%CPU
0.07user 0.36system 0:03.08elapsed 14%CPU
After:
0.06user 0.53system 0:01.02elapsed 58%CPU
0.05user 0.54system 0:01.01elapsed 58%CPU
0.06user 0.52system 0:01.04elapsed 57%CPU
git diff > /dev/null
Before:
0.02user 0.31system 0:02.88elapsed 11%CPU
0.01user 0.32system 0:02.53elapsed 13%CPU
0.01user 0.28system 0:02.78elapsed 10%CPU
After:
0.01user 0.47system 0:00.52elapsed 92%CPU
0.01user 0.48system 0:00.52elapsed 94%CPU
0.01user 0.47system 0:00.54elapsed 88%CPU
I have no explanation for why the diff numbers are different from yesterday.
Could be that there was some nightly cron job running last night which
slowed things down. Still, the same ~5x speedup is observed!
Wow! Thanks!
-brandon
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-14 18:32 ` Brandon Casey
@ 2008-11-14 19:23 ` Linus Torvalds
2008-11-14 20:14 ` Junio C Hamano
2008-11-15 12:08 ` [PATCH] Makefile: introduce NO_PTHREADS Junio C Hamano
0 siblings, 2 replies; 44+ messages in thread
From: Linus Torvalds @ 2008-11-14 19:23 UTC (permalink / raw)
To: Brandon Casey
Cc: James Pickens, Bruce Fields, Junio C Hamano, Git Mailing List
On Fri, 14 Nov 2008, Brandon Casey wrote:
>
> I have no explanation for why the diff numbers are different from
> yesterday. Could be that there was some nightly cron job running last
> night which slowed things down. Still, the same ~5x speedup is observed!
One of the things I changed (inadvertently - it was just meant to be a
test) was that the last patch had MAX_PARALLEL set to 100 rather than 10.
That's definitely overkill.
I also think the thread cost was wrong: it did
threads = index->cache_nr / 100;
to give a first-order "how many threads do we want", but the thread
startup is likely to be higher than 100 lstat calls, so we probably want
fewer threads than that. It doesn't much matter for something like the
Linux kernel, where there are so many files that we'll end up maxing out
the threads anyway, but for smaller projects, I suspect a thread cost of
"one thread per 500 files" is more reasonable. You almost certainly don't
want to thread anything at all for fewer than a few hundred files.
So here's a slight incremental update to my patch from yesterday. It also
adds the config variable, and it defaults to off, so to actually see this
in action, you now need to add
[core]
PreloadIndex = true
to your ~/.gitconfig file.
Totally untested. As usual. Maybe it works. Maybe it doesn't.
Linus
---
Documentation/config.txt | 12 +++++++++++-
cache.h | 1 +
config.c | 5 +++++
environment.c | 3 +++
preload-index.c | 18 +++++++++++++-----
5 files changed, 33 insertions(+), 6 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 32dcd64..9260121 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -411,7 +411,17 @@ core.fsyncobjectfiles::
This is a total waste of time and effort on a filesystem that orders
data writes properly, but can be useful for filesystems that do not use
journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+and not file contents (OS X's HFS+, or Linux ext3 with
+"data=writeback").
+
+core.preloadindex::
+ Enable parallel index preload for operations like 'git diff'
++
+This can speed up operations like 'git diff' and 'git status' especially
+on filesystems like NFS that have weak caching semantics and thus
+relatively high IO latencies. With this set to 'true', git will do the
+index comparison to the filesystem data in parallel, allowing
+overlapping IO's.
alias.*::
Command aliases for the linkgit:git[1] command wrapper - e.g.
diff --git a/cache.h b/cache.h
index 64239fb..685a866 100644
--- a/cache.h
+++ b/cache.h
@@ -460,6 +460,7 @@ extern size_t packed_git_limit;
extern size_t delta_base_cache_limit;
extern int auto_crlf;
extern int fsync_object_files;
+extern int core_preload_index;
enum safe_crlf {
SAFE_CRLF_FALSE = 0,
diff --git a/config.c b/config.c
index 67cc1dc..d2fc8f5 100644
--- a/config.c
+++ b/config.c
@@ -490,6 +490,11 @@ static int git_default_core_config(const char *var, const char *value)
return 0;
}
+ if (!strcmp(var, "core.preloadindex")) {
+ core_preload_index = git_config_bool(var, value);
+ return 0;
+ }
+
/* Add other config variables here and to Documentation/config.txt. */
return 0;
}
diff --git a/environment.c b/environment.c
index bb96ac0..e278bce 100644
--- a/environment.c
+++ b/environment.c
@@ -43,6 +43,9 @@ unsigned whitespace_rule_cfg = WS_DEFAULT_RULE;
enum branch_track git_branch_track = BRANCH_TRACK_REMOTE;
enum rebase_setup_type autorebase = AUTOREBASE_NEVER;
+/* Parallel index stat data preload? */
+int core_preload_index = 0;
+
/* This is set by setup_git_dir_gently() and/or git_default_config() */
char *git_work_tree_cfg;
static char *work_tree;
diff --git a/preload-index.c b/preload-index.c
index e322c27..3ce42e0 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -4,9 +4,14 @@
#include <pthread.h>
#include "cache.h"
-/* Hacky hack-hack start */
-#define MAX_PARALLEL (100)
-int parallel_lstats = 1;
+/*
+ * Mostly randomly chosen maximum thread counts: we
+ * cap the parallelism to 20 threads, and we want
+ * to have at least 500 lstat's per thread for it to
+ * be worth starting a thread.
+ */
+#define MAX_PARALLEL (20)
+#define THREAD_COST (500)
struct thread_data {
pthread_t pthread;
@@ -47,10 +52,13 @@ static void *preload_thread(void *_data)
static void preload_index(struct index_state *index, const char **pathspec)
{
- int i, work, offset;
- int threads = index->cache_nr / 100;
+ int threads, i, work, offset;
struct thread_data data[MAX_PARALLEL];
+ if (!core_preload_index)
+ return;
+
+ threads = index->cache_nr / THREAD_COST;
if (threads < 2)
return;
if (threads > MAX_PARALLEL)
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-14 19:23 ` Linus Torvalds
@ 2008-11-14 20:14 ` Junio C Hamano
2008-11-14 23:10 ` Linus Torvalds
2008-11-15 12:08 ` [PATCH] Makefile: introduce NO_PTHREADS Junio C Hamano
1 sibling, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2008-11-14 20:14 UTC (permalink / raw)
To: Linus Torvalds
Cc: Brandon Casey, James Pickens, Bruce Fields, Git Mailing List
Linus Torvalds <torvalds@linux-foundation.org> writes:
> I also think the thread cost was wrong: it did
>
> threads = index->cache_nr / 100;
>
> to give a first-order "how many threads do we want", but the thread
> startup is likely to be higher than 100 lstat calls, so we probably want
> fewer threads than that. It doesn't much matter for something like the
> Linux kernel, where there are so many files that we'll end up maxing out
> the threads anyway, but for smaller projects, I suspect a thread cost of
> "one thread per 500 files" is more reasonable. You almost certainly don't
> want to thread anything at all for fewer than a few hundred files.
If you have 1000 files in a single directory, do you still want 2 threads
following the "1/500" rule, or they would compete reading the same
directory and using a single thread is better off?
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: hosting git on a nfs
2008-11-14 20:14 ` Junio C Hamano
@ 2008-11-14 23:10 ` Linus Torvalds
0 siblings, 0 replies; 44+ messages in thread
From: Linus Torvalds @ 2008-11-14 23:10 UTC (permalink / raw)
To: Junio C Hamano
Cc: Brandon Casey, James Pickens, Bruce Fields, Git Mailing List
On Fri, 14 Nov 2008, Junio C Hamano wrote:
>
> If you have 1000 files in a single directory, do you still want 2 threads
> following the "1/500" rule, or they would compete reading the same
> directory and using a single thread is better off?
Well, first off, the "single directory" thing is really a Linux kernel
deficiency, and it's entirely possible that it doesn't even exist on other
systems. Linux has a very special directory cache (dcache) model that is
pretty unique - it's part of why cached 'lstat()' calls are so cheap on
Linux - but it is also part of the reason for why we serialize lookups
when we do miss in the cache (*).
Secondly, anybody who has a thousand tracked files in a single directory
can damn well blame themselves for being stupid. So I don't think it's a
case that is worth worrying too much about. Git will slow down for that
kind of situation for other reasons (ie a lot of the tree pruning
optimization won't work for projects that have large flat directories).
So i wouldn't worry about it. That said, with the second patch, we default
to having people enable this explicitly, so it's something that people can
decide on their own.
Linus
(*) That said - the Linux dcache consistency is just _one_ reason why we
serialize lookups. I would not be in the least surprised if other OS's
have the exact same issue. I'd love to fix it in Linux, but quiet
honestly, it has never actually come up before now, and we've literally
worked on multi-threading the _cached_ case, not the uncached one.
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH] Makefile: introduce NO_PTHREADS
2008-11-14 19:23 ` Linus Torvalds
2008-11-14 20:14 ` Junio C Hamano
@ 2008-11-15 12:08 ` Junio C Hamano
2008-11-15 17:15 ` Linus Torvalds
1 sibling, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2008-11-15 12:08 UTC (permalink / raw)
To: Git Mailing List; +Cc: Linus Torvalds
This introduces make variable NO_PTHREADS for platforms that lack the
support for pthreads library or people who do not want to use it for
whatever reason. When defined, it makes the multi-threaded index
preloading into a no-op, and also disables threaded delta searching by
pack-objects.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
* I notice a handful platforms do not define THREADED_DELTA_SEARCH, and
on them Linus's preload-index.c is the first source file that includes
<pthreads.h>, which may result in breakages.
Makefile | 10 +++++++++-
config.mak.in | 1 +
configure.ac | 5 +++++
preload-index.c | 9 +++++++++
4 files changed, 24 insertions(+), 1 deletions(-)
diff --git c/Makefile w/Makefile
index acac0ae..ffc9531 100644
--- c/Makefile
+++ w/Makefile
@@ -90,6 +90,8 @@ all::
#
# Define NO_MMAP if you want to avoid mmap.
#
+# Define NO_PTHREADS if you do not have or do not want to use Pthreads.
+#
# Define NO_PREAD if you have a problem with pread() system call (e.g.
# cygwin.dll before v1.5.22).
#
@@ -1018,9 +1020,15 @@ ifdef INTERNAL_QSORT
COMPAT_OBJS += compat/qsort.o
endif
+ifdef NO_PTHREADS
+ THREADED_DELTA_SEARCH =
+ BASIC_CFLAGS += -DNO_PTHREADS
+else
+ EXTLIBS += $(PTHREAD_LIBS)
+endif
+
ifdef THREADED_DELTA_SEARCH
BASIC_CFLAGS += -DTHREADED_DELTA_SEARCH
- EXTLIBS += $(PTHREAD_LIBS)
LIB_OBJS += thread-utils.o
endif
ifdef DIR_HAS_BSD_GROUP_SEMANTICS
diff --git c/config.mak.in w/config.mak.in
index ea7705c..14dfb21 100644
--- c/config.mak.in
+++ w/config.mak.in
@@ -51,4 +51,5 @@ OLD_ICONV=@OLD_ICONV@
NO_DEFLATE_BOUND=@NO_DEFLATE_BOUND@
FREAD_READS_DIRECTORIES=@FREAD_READS_DIRECTORIES@
SNPRINTF_RETURNS_BOGUS=@SNPRINTF_RETURNS_BOGUS@
+NO_PTHREADS=@NO_PTHREADS@
PTHREAD_LIBS=@PTHREAD_LIBS@
diff --git c/configure.ac w/configure.ac
index 4256742..8821b50 100644
--- c/configure.ac
+++ w/configure.ac
@@ -490,6 +490,8 @@ AC_SUBST(NO_MKDTEMP)
# Define NO_SYMLINK_HEAD if you never want .git/HEAD to be a symbolic link.
# Enable it on Windows. By default, symrefs are still used.
#
+# Define NO_PTHREADS if we do not have pthreads
+#
# Define PTHREAD_LIBS to the linker flag used for Pthread support.
AC_LANG_CONFTEST([AC_LANG_PROGRAM(
[[#include <pthread.h>]],
@@ -502,9 +504,12 @@ else
${CC} -lpthread conftest.c -o conftest.o > /dev/null 2>&1
if test $? -eq 0;then
PTHREAD_LIBS="-lpthread"
+ else
+ NO_PTHREADS=UnfortunatelyYes
fi
fi
AC_SUBST(PTHREAD_LIBS)
+AC_SUBST(NO_PTHREADS)
## Site configuration (override autodetection)
## --with-PACKAGE[=ARG] and --without-PACKAGE
diff --git c/preload-index.c w/preload-index.c
index 6253578..3ae83dc 100644
--- c/preload-index.c
+++ w/preload-index.c
@@ -2,6 +2,14 @@
* Copyright (C) 2008 Linus Torvalds
*/
#include "cache.h"
+
+#ifdef NO_PTHREADS
+static void preload_index(struct index_state *index, const char **pathspec)
+{
+ ; /* nothing */
+}
+#else
+
#include <pthread.h>
/*
@@ -81,6 +89,7 @@ static void preload_index(struct index_state *index, const char **pathspec)
die("unable to join threaded lstat");
}
}
+#endif
int read_index_preload(struct index_state *index, const char **pathspec)
{
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH] Makefile: introduce NO_PTHREADS
2008-11-15 12:08 ` [PATCH] Makefile: introduce NO_PTHREADS Junio C Hamano
@ 2008-11-15 17:15 ` Linus Torvalds
2008-11-17 10:03 ` Mike Ralphson
` (2 more replies)
0 siblings, 3 replies; 44+ messages in thread
From: Linus Torvalds @ 2008-11-15 17:15 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
On Sat, 15 Nov 2008, Junio C Hamano wrote:
>
> This introduces make variable NO_PTHREADS for platforms that lack the
> support for pthreads library or people who do not want to use it for
> whatever reason. When defined, it makes the multi-threaded index
> preloading into a no-op, and also disables threaded delta searching by
> pack-objects.
Ack. Makes sense.
Linus
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] Makefile: introduce NO_PTHREADS
2008-11-15 17:15 ` Linus Torvalds
@ 2008-11-17 10:03 ` Mike Ralphson
2008-11-17 10:18 ` Junio C Hamano
2008-11-17 10:34 ` Johannes Sixt
2008-11-17 16:38 ` Junio C Hamano
2008-11-17 16:41 ` [PATCH] Makefile: introduce NO_PTHREADS Junio C Hamano
2 siblings, 2 replies; 44+ messages in thread
From: Mike Ralphson @ 2008-11-17 10:03 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Sixt, Git Mailing List, Linus Torvalds
2008/11/15 Linus Torvalds <torvalds@linux-foundation.org>:
>
> On Sat, 15 Nov 2008, Junio C Hamano wrote:
>>
>> This introduces make variable NO_PTHREADS for platforms that lack the
>> support for pthreads library or people who do not want to use it for
>> whatever reason. When defined, it makes the multi-threaded index
>> preloading into a no-op, and also disables threaded delta searching by
>> pack-objects.
>
> Ack. Makes sense.
I'd be minded to make this the default on AIX to keep the prerequisite
list as small as possible, then people can opt-in for the performance
benefits if required.
I'll wait a little while to see if anyone else reports the same for
other platforms and then submit a patch.
Mike
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] Makefile: introduce NO_PTHREADS
2008-11-17 10:03 ` Mike Ralphson
@ 2008-11-17 10:18 ` Junio C Hamano
2008-11-17 10:34 ` Johannes Sixt
1 sibling, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2008-11-17 10:18 UTC (permalink / raw)
To: Mike Ralphson; +Cc: Johannes Sixt, Git Mailing List, Linus Torvalds
"Mike Ralphson" <mike.ralphson@gmail.com> writes:
> 2008/11/15 Linus Torvalds <torvalds@linux-foundation.org>:
>>
>> On Sat, 15 Nov 2008, Junio C Hamano wrote:
>>>
>>> This introduces make variable NO_PTHREADS for platforms that lack the
>>> support for pthreads library or people who do not want to use it for
>>> whatever reason. When defined, it makes the multi-threaded index
>>> preloading into a no-op, and also disables threaded delta searching by
>>> pack-objects.
>>
>> Ack. Makes sense.
>
> I'd be minded to make this the default on AIX to keep the prerequisite
> list as small as possible, then people can opt-in for the performance
> benefits if required.
>
> I'll wait a little while to see if anyone else reports the same for
> other platforms and then submit a patch.
Thanks.
I expect to be slow this week til just before Thanksgiving, so the more
people we have to keep an eye on the areas they excel at, the better for
all of us and certainly it would help me a lot.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] Makefile: introduce NO_PTHREADS
2008-11-17 10:03 ` Mike Ralphson
2008-11-17 10:18 ` Junio C Hamano
@ 2008-11-17 10:34 ` Johannes Sixt
2008-11-17 10:45 ` Mike Ralphson
1 sibling, 1 reply; 44+ messages in thread
From: Johannes Sixt @ 2008-11-17 10:34 UTC (permalink / raw)
To: Mike Ralphson; +Cc: Junio C Hamano, Git Mailing List, Linus Torvalds
Mike Ralphson schrieb:
> 2008/11/15 Linus Torvalds <torvalds@linux-foundation.org>:
>> On Sat, 15 Nov 2008, Junio C Hamano wrote:
>>> This introduces make variable NO_PTHREADS for platforms that lack the
>>> support for pthreads library or people who do not want to use it for
>>> whatever reason. When defined, it makes the multi-threaded index
>>> preloading into a no-op, and also disables threaded delta searching by
>>> pack-objects.
>> Ack. Makes sense.
>
> I'd be minded to make this the default on AIX to keep the prerequisite
> list as small as possible, then people can opt-in for the performance
> benefits if required.
Is pthreads not a standard shipment on AIX? I would set NO_PTHREADS only
if we know in advance that there are many installations without pthreads.
(And I don't know what the situation is.)
BTW, this needs to be squashed in, because we don't have pthreads on Windows:
diff --git a/Makefile b/Makefile
index ffc9531..3a30b8c 100644
--- a/Makefile
+++ b/Makefile
@@ -769,6 +769,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
NO_STRCASESTR = YesPlease
NO_STRLCPY = YesPlease
NO_MEMMEM = YesPlease
+ NO_PTHREADS = YesPlease
NEEDS_LIBICONV = YesPlease
OLD_ICONV = YesPlease
NO_C99_FORMAT = YesPlease
-- Hannes
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH] Makefile: introduce NO_PTHREADS
2008-11-17 10:34 ` Johannes Sixt
@ 2008-11-17 10:45 ` Mike Ralphson
2008-11-17 11:25 ` Johannes Sixt
0 siblings, 1 reply; 44+ messages in thread
From: Mike Ralphson @ 2008-11-17 10:45 UTC (permalink / raw)
To: Johannes Sixt; +Cc: Junio C Hamano, Git Mailing List, Linus Torvalds
2008/11/17 Johannes Sixt <j.sixt@viscovery.net>:
> Mike Ralphson schrieb:
>> 2008/11/15 Linus Torvalds <torvalds@linux-foundation.org>:
>>> On Sat, 15 Nov 2008, Junio C Hamano wrote:
>>>> This introduces make variable NO_PTHREADS for platforms that lack the
>>>> support for pthreads library or people who do not want to use it for
>>>> whatever reason. When defined, it makes the multi-threaded index
>>>> preloading into a no-op, and also disables threaded delta searching by
>>>> pack-objects.
>>> Ack. Makes sense.
>>
>> I'd be minded to make this the default on AIX to keep the prerequisite
>> list as small as possible, then people can opt-in for the performance
>> benefits if required.
>
> Is pthreads not a standard shipment on AIX? I would set NO_PTHREADS only
> if we know in advance that there are many installations without pthreads.
> (And I don't know what the situation is.)
I should have dug a bit further, it seems to be present on my 5.3
machines but I still need to determine whether it got installed by
default. Either way it must need some other link flags...
> BTW, this needs to be squashed in, because we don't have pthreads on Windows:
>
> diff --git a/Makefile b/Makefile
> index ffc9531..3a30b8c 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -769,6 +769,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
> NO_STRCASESTR = YesPlease
> NO_STRLCPY = YesPlease
> NO_MEMMEM = YesPlease
> + NO_PTHREADS = YesPlease
> NEEDS_LIBICONV = YesPlease
> OLD_ICONV = YesPlease
> NO_C99_FORMAT = YesPlease
>
Ta. Ok to add your S-o-B on a squashed patch?
Mike
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] Makefile: introduce NO_PTHREADS
2008-11-17 10:45 ` Mike Ralphson
@ 2008-11-17 11:25 ` Johannes Sixt
2008-12-01 8:29 ` Johannes Sixt
0 siblings, 1 reply; 44+ messages in thread
From: Johannes Sixt @ 2008-11-17 11:25 UTC (permalink / raw)
To: Mike Ralphson; +Cc: Junio C Hamano, Git Mailing List, Linus Torvalds
Mike Ralphson schrieb:
> 2008/11/17 Johannes Sixt <j.sixt@viscovery.net>:
>> Mike Ralphson schrieb:
>>> 2008/11/15 Linus Torvalds <torvalds@linux-foundation.org>:
>>>> On Sat, 15 Nov 2008, Junio C Hamano wrote:
>>>>> This introduces make variable NO_PTHREADS for platforms that lack the
>>>>> support for pthreads library or people who do not want to use it for
>>>>> whatever reason. When defined, it makes the multi-threaded index
>>>>> preloading into a no-op, and also disables threaded delta searching by
>>>>> pack-objects.
>>>> Ack. Makes sense.
>>> I'd be minded to make this the default on AIX to keep the prerequisite
>>> list as small as possible, then people can opt-in for the performance
>>> benefits if required.
>> Is pthreads not a standard shipment on AIX? I would set NO_PTHREADS only
>> if we know in advance that there are many installations without pthreads.
>> (And I don't know what the situation is.)
>
> I should have dug a bit further, it seems to be present on my 5.3
> machines but I still need to determine whether it got installed by
> default. Either way it must need some other link flags...
I tried compiling with THREADED_DELTA_SEARCH=Yes, and it fails with
CC builtin-pack-objects.o
In file included from /usr/include/sys/pri.h:29,
from /usr/include/sys/sched.h:38,
from /usr/include/sched.h:52,
from /usr/include/pthread.h:43,
from builtin-pack-objects.c:22:
/usr/include/sys/proc.h:203: parse error before "crid_t"
/usr/include/sys/proc.h:212: parse error before "p_class"
/usr/include/sys/proc.h:355: parse error before '}' token
:-( Maybe NO_PTHREADS is indeed the safer choice? I'm not going to dig
into this today, though. (I'm on AIX 4.3.something.)
>> BTW, this needs to be squashed in, because we don't have pthreads on Windows:
>>
>> diff --git a/Makefile b/Makefile
>> index ffc9531..3a30b8c 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -769,6 +769,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
>> NO_STRCASESTR = YesPlease
>> NO_STRLCPY = YesPlease
>> NO_MEMMEM = YesPlease
>> + NO_PTHREADS = YesPlease
>> NEEDS_LIBICONV = YesPlease
>> OLD_ICONV = YesPlease
>> NO_C99_FORMAT = YesPlease
>>
>
> Ta. Ok to add your S-o-B on a squashed patch?
Sure. Use this address please:
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
-- Hannes
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] Makefile: introduce NO_PTHREADS
2008-11-15 17:15 ` Linus Torvalds
2008-11-17 10:03 ` Mike Ralphson
@ 2008-11-17 16:38 ` Junio C Hamano
2008-11-17 16:47 ` Linus Torvalds
2008-11-17 16:41 ` [PATCH] Makefile: introduce NO_PTHREADS Junio C Hamano
2 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2008-11-17 16:38 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Sat, 15 Nov 2008, Junio C Hamano wrote:
>>
>> This introduces make variable NO_PTHREADS for platforms that lack the
>> support for pthreads library or people who do not want to use it for
>> whatever reason. When defined, it makes the multi-threaded index
>> preloading into a no-op, and also disables threaded delta searching by
>> pack-objects.
>
> Ack. Makes sense.
Hmm, I started getting random segfaults that sometimes reproduces. For
example, this is what I just got from "git diff --stat $some_commit".
#0 0x00002b952568b090 in strlen () from /lib/libc.so.6
#1 0x000000000044ed42 in git_checkattr (path=0x2b95284b3970 "parse-options.c",
num=3, check=0x41000e90) at attr.c:512
#2 0x0000000000458921 in convert_to_git (
path=0x2b95284b3970 "parse-options.c",
src=0x2aaaaaabb000 <Address 0x2aaaaaabb000 out of bounds>, len=12594,
dst=0x41000f30, checksafe=SAFE_CRLF_FALSE) at convert.c:578
#3 0x0000000000489ac3 in index_mem (
sha1=0x41000ff0 "\210f�⽡x�\207�� 7R}\217\032��", buf=0x2aaaaaabb000,
size=12594, write_object=0, type=<value optimized out>,
path=0x2f2f2f2f2f2f2f2f <Address 0x2f2f2f2f2f2f2f2f out of bounds>)
at sha1_file.c:2451
#4 0x0000000000489c3d in index_fd (
sha1=0x41000ff0 "\210f�⽡x�\207�� 7R}\217\032��", fd=5,
st=<value optimized out>, write_object=0, type=OBJ_BLOB,
path=0x2b95284b3970 "parse-options.c") at sha1_file.c:2483
#5 0x000000000047857a in ce_modified_check_fs (ce=0x2b95284b3930,
st=0x41001080) at read-cache.c:92
#6 0x00000000004786a2 in ie_match_stat (istate=0x71c860, ce=0x2b95284b3930,
st=0x41001080, options=<value optimized out>) at read-cache.c:282
#7 0x0000000000497e65 in preload_thread (_data=<value optimized out>)
at preload-index.c:46
#8 0x00002b9525964017 in start_thread () from /lib/libpthread.so.0
#9 0x00002b95256da5bd in clone () from /lib/libc.so.6
#10 0x0000000000000000 in ?? ()
I suspect that the callpath around ce_modified_check_fs() uses a buffer
obtained from path.c:get_pathname() and parallel threads stomp on each
other, but I do not have time to debug this right now (I will be on a
14-hour flight in a few hours).
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] Makefile: introduce NO_PTHREADS
2008-11-15 17:15 ` Linus Torvalds
2008-11-17 10:03 ` Mike Ralphson
2008-11-17 16:38 ` Junio C Hamano
@ 2008-11-17 16:41 ` Junio C Hamano
2 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2008-11-17 16:41 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Sat, 15 Nov 2008, Junio C Hamano wrote:
>>
>> This introduces make variable NO_PTHREADS for platforms that lack the
>> support for pthreads library or people who do not want to use it for
>> whatever reason. When defined, it makes the multi-threaded index
>> preloading into a no-op, and also disables threaded delta searching by
>> pack-objects.
>
> Ack. Makes sense.
Hmm, I started getting random segfaults that sometimes reproduce. For
example, this is what I just got from "git diff --stat $some_commit".
#0 0x00002b952568b090 in strlen () from /lib/libc.so.6
#1 0x000000000044ed42 in git_checkattr (path=0x2b95284b3970 "parse-options.c",
num=3, check=0x41000e90) at attr.c:512
#2 0x0000000000458921 in convert_to_git (
path=0x2b95284b3970 "parse-options.c",
src=0x2aaaaaabb000 <Address 0x2aaaaaabb000 out of bounds>, len=12594,
dst=0x41000f30, checksafe=SAFE_CRLF_FALSE) at convert.c:578
#3 0x0000000000489ac3 in index_mem (
sha1=0x41000ff0 "\210f�⽡x�\207�� 7R}\217\032��", buf=0x2aaaaaabb000,
size=12594, write_object=0, type=<value optimized out>,
path=0x2f2f2f2f2f2f2f2f <Address 0x2f2f2f2f2f2f2f2f out of bounds>)
at sha1_file.c:2451
#4 0x0000000000489c3d in index_fd (
sha1=0x41000ff0 "\210f�⽡x�\207�� 7R}\217\032��", fd=5,
st=<value optimized out>, write_object=0, type=OBJ_BLOB,
path=0x2b95284b3970 "parse-options.c") at sha1_file.c:2483
#5 0x000000000047857a in ce_modified_check_fs (ce=0x2b95284b3930,
st=0x41001080) at read-cache.c:92
#6 0x00000000004786a2 in ie_match_stat (istate=0x71c860, ce=0x2b95284b3930,
st=0x41001080, options=<value optimized out>) at read-cache.c:282
#7 0x0000000000497e65 in preload_thread (_data=<value optimized out>)
at preload-index.c:46
#8 0x00002b9525964017 in start_thread () from /lib/libpthread.so.0
#9 0x00002b95256da5bd in clone () from /lib/libc.so.6
#10 0x0000000000000000 in ?? ()
Unfortunately, I do not have time to debug this right now (I will be on a
14-hour flight in a few hours).
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] Makefile: introduce NO_PTHREADS
2008-11-17 16:38 ` Junio C Hamano
@ 2008-11-17 16:47 ` Linus Torvalds
2008-11-17 17:01 ` Fix index preloading for racy dirty case Linus Torvalds
0 siblings, 1 reply; 44+ messages in thread
From: Linus Torvalds @ 2008-11-17 16:47 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
On Mon, 17 Nov 2008, Junio C Hamano wrote:
>
> I suspect that the callpath around ce_modified_check_fs() uses a buffer
> obtained from path.c:get_pathname() and parallel threads stomp on each
> other, but I do not have time to debug this right now (I will be on a
> 14-hour flight in a few hours).
Oh, damn. I had forgotten that check_fs() doesn't just do a "lstat()" any
more. You're right.
Let me look at it.
Linus
^ permalink raw reply [flat|nested] 44+ messages in thread
* Fix index preloading for racy dirty case
2008-11-17 16:47 ` Linus Torvalds
@ 2008-11-17 17:01 ` Linus Torvalds
0 siblings, 0 replies; 44+ messages in thread
From: Linus Torvalds @ 2008-11-17 17:01 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
In the threaded index preloading case, we must be sure to always use the
CE_MATCH_RACY_IS_DIRTY flag when calling ie_match_stat(), in order to make
sure that we only ever look at the stat() data, and don't try to do
anything fancy.
Because most of git internals are not thread-safe, and must not be called
in parallel.
Otherwise, what happens is that if the timestamps indicate that an entry
_might_ be dirty, we might start actually comparing filesystem data with
the object database. And we mustn't do that, because that would involve
looking up and creating the object structure, and that whole code sequence
with read_sha1_file() where we look up and add objects to the hashes is
definitely not thread-safe.
Nor do we want to add locking, because the whole point of the preload was
to be simple and not affect anything else. With CE_MATCH_RACY_IS_DIRTY, we
get what we wanted, and we'll just leave the hard cases well alone, to be
done later in the much simpler serial case.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
On Mon, 17 Nov 2008, Linus Torvalds wrote:
>
> Oh, damn. I had forgotten that check_fs() doesn't just do a "lstat()" any
> more. You're right.
Never mind the "any more". I don't think it ever did.
But I do think that this is trivially fixed, and I should have thought
about it. And while I didn't reproduce your SIGSEGV, I think this trivial
patch should fix it.
Sorry about the mindfart.
preload-index.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/preload-index.c b/preload-index.c
index 3ce42e0..a6a6bdb 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -43,7 +43,7 @@ static void *preload_thread(void *_data)
continue;
if (lstat(ce->name, &st))
continue;
- if (ie_match_stat(index, ce, &st, 0))
+ if (ie_match_stat(index, ce, &st, CE_MATCH_RACY_IS_DIRTY))
continue;
ce_mark_uptodate(ce);
} while (--nr > 0);
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH] Makefile: introduce NO_PTHREADS
2008-11-17 11:25 ` Johannes Sixt
@ 2008-12-01 8:29 ` Johannes Sixt
2008-12-01 8:48 ` dhruva
` (3 more replies)
0 siblings, 4 replies; 44+ messages in thread
From: Johannes Sixt @ 2008-12-01 8:29 UTC (permalink / raw)
To: Mike Ralphson; +Cc: Junio C Hamano, Git Mailing List, Linus Torvalds
Johannes Sixt schrieb:
> Mike Ralphson schrieb:
>> 2008/11/17 Johannes Sixt <j.sixt@viscovery.net>:
>>> Mike Ralphson schrieb:
>>>> 2008/11/15 Linus Torvalds <torvalds@linux-foundation.org>:
>>>>> On Sat, 15 Nov 2008, Junio C Hamano wrote:
>>>>>> This introduces make variable NO_PTHREADS for platforms that lack the
>>>>>> support for pthreads library or people who do not want to use it for
>>>>>> whatever reason. When defined, it makes the multi-threaded index
>>>>>> preloading into a no-op, and also disables threaded delta searching by
>>>>>> pack-objects.
>>>>> Ack. Makes sense.
>>>> I'd be minded to make this the default on AIX to keep the prerequisite
>>>> list as small as possible, then people can opt-in for the performance
>>>> benefits if required.
>>> Is pthreads not a standard shipment on AIX? I would set NO_PTHREADS only
>>> if we know in advance that there are many installations without pthreads.
>>> (And I don't know what the situation is.)
>> I should have dug a bit further, it seems to be present on my 5.3
>> machines but I still need to determine whether it got installed by
>> default. Either way it must need some other link flags...
>
> I tried compiling with THREADED_DELTA_SEARCH=Yes, and it fails with
>
> CC builtin-pack-objects.o
> In file included from /usr/include/sys/pri.h:29,
> from /usr/include/sys/sched.h:38,
> from /usr/include/sched.h:52,
> from /usr/include/pthread.h:43,
> from builtin-pack-objects.c:22:
> /usr/include/sys/proc.h:203: parse error before "crid_t"
> /usr/include/sys/proc.h:212: parse error before "p_class"
> /usr/include/sys/proc.h:355: parse error before '}' token
>
> :-( Maybe NO_PTHREADS is indeed the safer choice? I'm not going to dig
> into this today, though. (I'm on AIX 4.3.something.)
>
>>> BTW, this needs to be squashed in, because we don't have pthreads on Windows:
>>>
>>> diff --git a/Makefile b/Makefile
>>> index ffc9531..3a30b8c 100644
>>> --- a/Makefile
>>> +++ b/Makefile
>>> @@ -769,6 +769,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
>>> NO_STRCASESTR = YesPlease
>>> NO_STRLCPY = YesPlease
>>> NO_MEMMEM = YesPlease
>>> + NO_PTHREADS = YesPlease
>>> NEEDS_LIBICONV = YesPlease
>>> OLD_ICONV = YesPlease
>>> NO_C99_FORMAT = YesPlease
>>>
>> Ta. Ok to add your S-o-B on a squashed patch?
>
> Sure. Use this address please:
>
> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Mike,
you said you would resend the patch, but I think you forgot about it.
Would you do that now, please?
-- Hannes
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] Makefile: introduce NO_PTHREADS
2008-12-01 8:29 ` Johannes Sixt
@ 2008-12-01 8:48 ` dhruva
2008-12-01 9:57 ` Mike Ralphson
` (2 subsequent siblings)
3 siblings, 0 replies; 44+ messages in thread
From: dhruva @ 2008-12-01 8:48 UTC (permalink / raw)
To: Johannes Sixt
Cc: Mike Ralphson, Junio C Hamano, Git Mailing List, Linus Torvalds
Hello,
I am able to compile and use git by defining THREADED_DELTA_SEARCH
and copying setjmp.h from mingw (www.mingw.org) to msys include folder
with small changes to it. If it does add value, we should try to
enable this by default.
-dhruva
On Mon, Dec 1, 2008 at 1:59 PM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> Johannes Sixt schrieb:
>> Mike Ralphson schrieb:
>>> 2008/11/17 Johannes Sixt <j.sixt@viscovery.net>:
>>>> Mike Ralphson schrieb:
>>>>> 2008/11/15 Linus Torvalds <torvalds@linux-foundation.org>:
>>>>>> On Sat, 15 Nov 2008, Junio C Hamano wrote:
>>>>>>> This introduces make variable NO_PTHREADS for platforms that lack the
>>>>>>> support for pthreads library or people who do not want to use it for
>>>>>>> whatever reason. When defined, it makes the multi-threaded index
>>>>>>> preloading into a no-op, and also disables threaded delta searching by
>>>>>>> pack-objects.
>>>>>> Ack. Makes sense.
>>>>> I'd be minded to make this the default on AIX to keep the prerequisite
>>>>> list as small as possible, then people can opt-in for the performance
>>>>> benefits if required.
>>>> Is pthreads not a standard shipment on AIX? I would set NO_PTHREADS only
>>>> if we know in advance that there are many installations without pthreads.
>>>> (And I don't know what the situation is.)
>>> I should have dug a bit further, it seems to be present on my 5.3
>>> machines but I still need to determine whether it got installed by
>>> default. Either way it must need some other link flags...
>>
>> I tried compiling with THREADED_DELTA_SEARCH=Yes, and it fails with
>>
>> CC builtin-pack-objects.o
>> In file included from /usr/include/sys/pri.h:29,
>> from /usr/include/sys/sched.h:38,
>> from /usr/include/sched.h:52,
>> from /usr/include/pthread.h:43,
>> from builtin-pack-objects.c:22:
>> /usr/include/sys/proc.h:203: parse error before "crid_t"
>> /usr/include/sys/proc.h:212: parse error before "p_class"
>> /usr/include/sys/proc.h:355: parse error before '}' token
>>
>> :-( Maybe NO_PTHREADS is indeed the safer choice? I'm not going to dig
>> into this today, though. (I'm on AIX 4.3.something.)
>>
>>>> BTW, this needs to be squashed in, because we don't have pthreads on Windows:
>>>>
>>>> diff --git a/Makefile b/Makefile
>>>> index ffc9531..3a30b8c 100644
>>>> --- a/Makefile
>>>> +++ b/Makefile
>>>> @@ -769,6 +769,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
>>>> NO_STRCASESTR = YesPlease
>>>> NO_STRLCPY = YesPlease
>>>> NO_MEMMEM = YesPlease
>>>> + NO_PTHREADS = YesPlease
>>>> NEEDS_LIBICONV = YesPlease
>>>> OLD_ICONV = YesPlease
>>>> NO_C99_FORMAT = YesPlease
>>>>
>>> Ta. Ok to add your S-o-B on a squashed patch?
>>
>> Sure. Use this address please:
>>
>> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
>
> Mike,
>
> you said you would resend the patch, but I think you forgot about it.
> Would you do that now, please?
>
> -- Hannes
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Contents reflect my personal views only!
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] Makefile: introduce NO_PTHREADS
2008-12-01 8:29 ` Johannes Sixt
2008-12-01 8:48 ` dhruva
@ 2008-12-01 9:57 ` Mike Ralphson
2008-12-01 16:09 ` Mike Ralphson
2008-12-01 16:13 ` Mike Ralphson
3 siblings, 0 replies; 44+ messages in thread
From: Mike Ralphson @ 2008-12-01 9:57 UTC (permalink / raw)
To: Johannes Sixt, Junio C Hamano; +Cc: Git Mailing List, Linus Torvalds, dhruvakm
2008/12/1 Johannes Sixt <j.sixt@viscovery.net>
>
> Johannes Sixt schrieb:
> > Mike Ralphson schrieb:
> >> 2008/11/17 Johannes Sixt <j.sixt@viscovery.net>:
> >>> Mike Ralphson schrieb:
> >>>> 2008/11/15 Linus Torvalds <torvalds@linux-foundation.org>:
> >>>>> On Sat, 15 Nov 2008, Junio C Hamano wrote:
> >>>>>> This introduces make variable NO_PTHREADS for platforms that lack the
> >>>>>> support for pthreads library or people who do not want to use it for
> >>>>>> whatever reason. When defined, it makes the multi-threaded index
> >>>>>> preloading into a no-op, and also disables threaded delta searching by
> >>>>>> pack-objects.
> >>>>> ...
> >
> > :-( Maybe NO_PTHREADS is indeed the safer choice? I'm not going to dig
> > into this today, though. (I'm on AIX 4.3.something.)
> >
> >>> BTW, this needs to be squashed in, because we don't have pthreads on Windows:
> >>>...
>
> you said you would resend the patch, but I think you forgot about it.
> Would you do that now, please?
Not forgotten, just slow. The current state I believe is we should
have NO_PTHREADS for AIX < v5. THREADED_DELTA_SEARCH and the new
multi-threaded lstat are actually ok on AIX 5.3 at least - though I
couldn't see any performance benefit on my repo / hardware
combination.
But now after dhruva's observation I'm unsure what the desired change
is for Windows. 8-(
Mike
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH] Makefile: introduce NO_PTHREADS
2008-12-01 8:29 ` Johannes Sixt
2008-12-01 8:48 ` dhruva
2008-12-01 9:57 ` Mike Ralphson
@ 2008-12-01 16:09 ` Mike Ralphson
2008-12-01 16:13 ` Mike Ralphson
3 siblings, 0 replies; 44+ messages in thread
From: Mike Ralphson @ 2008-12-01 16:09 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Mike Ralphson, j.sixt
From: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Mike Ralphson <mike@abacus.co.uk>
---
* I notice a handful platforms do not define THREADED_DELTA_SEARCH, and
on them Linus's preload-index.c is the first source file that includes
<pthreads.h>, which may result in breakages.
Made the default for AIX <5 and Mingw
Author should still show as Junio, apologies if not
Makefile | 17 ++++++++++++++++-
config.mak.in | 1 +
configure.ac | 5 +++++
preload-index.c | 9 +++++++++
4 files changed, 31 insertions(+), 1 deletions(-)
diff --git a/Makefile b/Makefile
index d1e2116..5a69a41 100644
--- a/Makefile
+++ b/Makefile
@@ -90,6 +90,8 @@ all::
#
# Define NO_MMAP if you want to avoid mmap.
#
+# Define NO_PTHREADS if you do not have or do not want to use Pthreads.
+#
# Define NO_PREAD if you have a problem with pread() system call (e.g.
# cygwin.dll before v1.5.22).
#
@@ -164,6 +166,7 @@ uname_M := $(shell sh -c 'uname -m 2>/dev/null || echo not')
uname_O := $(shell sh -c 'uname -o 2>/dev/null || echo not')
uname_R := $(shell sh -c 'uname -r 2>/dev/null || echo not')
uname_P := $(shell sh -c 'uname -p 2>/dev/null || echo not')
+uname_V := $(shell sh -c 'uname -v 2>/dev/null || echo not')
# CFLAGS and LDFLAGS are for the users to override from the command line.
@@ -722,6 +725,11 @@ ifeq ($(uname_S),AIX)
INTERNAL_QSORT = UnfortunatelyYes
NEEDS_LIBICONV=YesPlease
BASIC_CFLAGS += -D_LARGE_FILES
+ ifneq ($(shell expr "$(uname_V)" : '[1234]'),1)
+ THREADED_DELTA_SEARCH = YesPlease
+ else
+ NO_PTHREADS = YesPlease
+ endif
endif
ifeq ($(uname_S),GNU)
# GNU/Hurd
@@ -766,6 +774,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
NO_STRCASESTR = YesPlease
NO_STRLCPY = YesPlease
NO_MEMMEM = YesPlease
+ NO_PTHREADS = YesPlease
NEEDS_LIBICONV = YesPlease
OLD_ICONV = YesPlease
NO_C99_FORMAT = YesPlease
@@ -1017,9 +1026,15 @@ ifdef INTERNAL_QSORT
COMPAT_OBJS += compat/qsort.o
endif
+ifdef NO_PTHREADS
+ THREADED_DELTA_SEARCH =
+ BASIC_CFLAGS += -DNO_PTHREADS
+else
+ EXTLIBS += $(PTHREAD_LIBS)
+endif
+
ifdef THREADED_DELTA_SEARCH
BASIC_CFLAGS += -DTHREADED_DELTA_SEARCH
- EXTLIBS += $(PTHREAD_LIBS)
LIB_OBJS += thread-utils.o
endif
ifdef DIR_HAS_BSD_GROUP_SEMANTICS
diff --git a/config.mak.in b/config.mak.in
index ea7705c..14dfb21 100644
--- a/config.mak.in
+++ b/config.mak.in
@@ -51,4 +51,5 @@ OLD_ICONV=@OLD_ICONV@
NO_DEFLATE_BOUND=@NO_DEFLATE_BOUND@
FREAD_READS_DIRECTORIES=@FREAD_READS_DIRECTORIES@
SNPRINTF_RETURNS_BOGUS=@SNPRINTF_RETURNS_BOGUS@
+NO_PTHREADS=@NO_PTHREADS@
PTHREAD_LIBS=@PTHREAD_LIBS@
diff --git a/configure.ac b/configure.ac
index 4256742..8821b50 100644
--- a/configure.ac
+++ b/configure.ac
@@ -490,6 +490,8 @@ AC_SUBST(NO_MKDTEMP)
# Define NO_SYMLINK_HEAD if you never want .git/HEAD to be a symbolic link.
# Enable it on Windows. By default, symrefs are still used.
#
+# Define NO_PTHREADS if we do not have pthreads
+#
# Define PTHREAD_LIBS to the linker flag used for Pthread support.
AC_LANG_CONFTEST([AC_LANG_PROGRAM(
[[#include <pthread.h>]],
@@ -502,9 +504,12 @@ else
${CC} -lpthread conftest.c -o conftest.o > /dev/null 2>&1
if test $? -eq 0;then
PTHREAD_LIBS="-lpthread"
+ else
+ NO_PTHREADS=UnfortunatelyYes
fi
fi
AC_SUBST(PTHREAD_LIBS)
+AC_SUBST(NO_PTHREADS)
## Site configuration (override autodetection)
## --with-PACKAGE[=ARG] and --without-PACKAGE
diff --git a/preload-index.c b/preload-index.c
index a685583..88edc5f 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -2,6 +2,14 @@
* Copyright (C) 2008 Linus Torvalds
*/
#include "cache.h"
+
+#ifdef NO_PTHREADS
+static void preload_index(struct index_state *index, const char **pathspec)
+{
+ ; /* nothing */
+}
+#else
+
#include <pthread.h>
/*
@@ -81,6 +89,7 @@ static void preload_index(struct index_state *index, const char **pathspec)
die("unable to join threaded lstat");
}
}
+#endif
int read_index_preload(struct index_state *index, const char **pathspec)
{
--
1.6.0.2.229.g1293c.dirty
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH] Makefile: introduce NO_PTHREADS
2008-12-01 8:29 ` Johannes Sixt
` (2 preceding siblings ...)
2008-12-01 16:09 ` Mike Ralphson
@ 2008-12-01 16:13 ` Mike Ralphson
2008-12-02 7:41 ` Johannes Sixt
3 siblings, 1 reply; 44+ messages in thread
From: Mike Ralphson @ 2008-12-01 16:13 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Mike Ralphson, j.sixt
From: Junio C Hamano <gitster@pobox.com>
This introduces make variable NO_PTHREADS for platforms that lack the
support for pthreads library or people who do not want to use it for
whatever reason. When defined, it makes the multi-threaded index
preloading into a no-op, and also disables threaded delta searching by
pack-objects.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Mike Ralphson <mike@abacus.co.uk>
---
* I notice a handful platforms do not define THREADED_DELTA_SEARCH, and
on them Linus's preload-index.c is the first source file that includes
<pthreads.h>, which may result in breakages.
Made the default for AIX <5 and Mingw
With correct commit message. Sorry for the previous attempt.
Makefile | 17 ++++++++++++++++-
config.mak.in | 1 +
configure.ac | 5 +++++
preload-index.c | 9 +++++++++
4 files changed, 31 insertions(+), 1 deletions(-)
diff --git a/Makefile b/Makefile
index d1e2116..5a69a41 100644
--- a/Makefile
+++ b/Makefile
@@ -90,6 +90,8 @@ all::
#
# Define NO_MMAP if you want to avoid mmap.
#
+# Define NO_PTHREADS if you do not have or do not want to use Pthreads.
+#
# Define NO_PREAD if you have a problem with pread() system call (e.g.
# cygwin.dll before v1.5.22).
#
@@ -164,6 +166,7 @@ uname_M := $(shell sh -c 'uname -m 2>/dev/null || echo not')
uname_O := $(shell sh -c 'uname -o 2>/dev/null || echo not')
uname_R := $(shell sh -c 'uname -r 2>/dev/null || echo not')
uname_P := $(shell sh -c 'uname -p 2>/dev/null || echo not')
+uname_V := $(shell sh -c 'uname -v 2>/dev/null || echo not')
# CFLAGS and LDFLAGS are for the users to override from the command line.
@@ -722,6 +725,11 @@ ifeq ($(uname_S),AIX)
INTERNAL_QSORT = UnfortunatelyYes
NEEDS_LIBICONV=YesPlease
BASIC_CFLAGS += -D_LARGE_FILES
+ ifneq ($(shell expr "$(uname_V)" : '[1234]'),1)
+ THREADED_DELTA_SEARCH = YesPlease
+ else
+ NO_PTHREADS = YesPlease
+ endif
endif
ifeq ($(uname_S),GNU)
# GNU/Hurd
@@ -766,6 +774,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
NO_STRCASESTR = YesPlease
NO_STRLCPY = YesPlease
NO_MEMMEM = YesPlease
+ NO_PTHREADS = YesPlease
NEEDS_LIBICONV = YesPlease
OLD_ICONV = YesPlease
NO_C99_FORMAT = YesPlease
@@ -1017,9 +1026,15 @@ ifdef INTERNAL_QSORT
COMPAT_OBJS += compat/qsort.o
endif
+ifdef NO_PTHREADS
+ THREADED_DELTA_SEARCH =
+ BASIC_CFLAGS += -DNO_PTHREADS
+else
+ EXTLIBS += $(PTHREAD_LIBS)
+endif
+
ifdef THREADED_DELTA_SEARCH
BASIC_CFLAGS += -DTHREADED_DELTA_SEARCH
- EXTLIBS += $(PTHREAD_LIBS)
LIB_OBJS += thread-utils.o
endif
ifdef DIR_HAS_BSD_GROUP_SEMANTICS
diff --git a/config.mak.in b/config.mak.in
index ea7705c..14dfb21 100644
--- a/config.mak.in
+++ b/config.mak.in
@@ -51,4 +51,5 @@ OLD_ICONV=@OLD_ICONV@
NO_DEFLATE_BOUND=@NO_DEFLATE_BOUND@
FREAD_READS_DIRECTORIES=@FREAD_READS_DIRECTORIES@
SNPRINTF_RETURNS_BOGUS=@SNPRINTF_RETURNS_BOGUS@
+NO_PTHREADS=@NO_PTHREADS@
PTHREAD_LIBS=@PTHREAD_LIBS@
diff --git a/configure.ac b/configure.ac
index 4256742..8821b50 100644
--- a/configure.ac
+++ b/configure.ac
@@ -490,6 +490,8 @@ AC_SUBST(NO_MKDTEMP)
# Define NO_SYMLINK_HEAD if you never want .git/HEAD to be a symbolic link.
# Enable it on Windows. By default, symrefs are still used.
#
+# Define NO_PTHREADS if we do not have pthreads
+#
# Define PTHREAD_LIBS to the linker flag used for Pthread support.
AC_LANG_CONFTEST([AC_LANG_PROGRAM(
[[#include <pthread.h>]],
@@ -502,9 +504,12 @@ else
${CC} -lpthread conftest.c -o conftest.o > /dev/null 2>&1
if test $? -eq 0;then
PTHREAD_LIBS="-lpthread"
+ else
+ NO_PTHREADS=UnfortunatelyYes
fi
fi
AC_SUBST(PTHREAD_LIBS)
+AC_SUBST(NO_PTHREADS)
## Site configuration (override autodetection)
## --with-PACKAGE[=ARG] and --without-PACKAGE
diff --git a/preload-index.c b/preload-index.c
index a685583..88edc5f 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -2,6 +2,14 @@
* Copyright (C) 2008 Linus Torvalds
*/
#include "cache.h"
+
+#ifdef NO_PTHREADS
+static void preload_index(struct index_state *index, const char **pathspec)
+{
+ ; /* nothing */
+}
+#else
+
#include <pthread.h>
/*
@@ -81,6 +89,7 @@ static void preload_index(struct index_state *index, const char **pathspec)
die("unable to join threaded lstat");
}
}
+#endif
int read_index_preload(struct index_state *index, const char **pathspec)
{
--
1.6.0.2.229.g1293c.dirty
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH] Makefile: introduce NO_PTHREADS
2008-12-01 16:13 ` Mike Ralphson
@ 2008-12-02 7:41 ` Johannes Sixt
2008-12-03 2:18 ` Junio C Hamano
0 siblings, 1 reply; 44+ messages in thread
From: Johannes Sixt @ 2008-12-02 7:41 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Mike Ralphson
Mike Ralphson schrieb:
> From: Junio C Hamano <gitster@pobox.com>
>
> This introduces make variable NO_PTHREADS for platforms that lack the
> support for pthreads library or people who do not want to use it for
> whatever reason. When defined, it makes the multi-threaded index
> preloading into a no-op, and also disables threaded delta searching by
> pack-objects.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Johannes Sixt <j6t@kdbg.org>
> Signed-off-by: Mike Ralphson <mike@abacus.co.uk>
> ---
You can add
Tested-by: Johannes Sixt <j6t@kdbg.org> (AIX 4.3.x)
-- Hannes
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] Makefile: introduce NO_PTHREADS
2008-12-02 7:41 ` Johannes Sixt
@ 2008-12-03 2:18 ` Junio C Hamano
0 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2008-12-03 2:18 UTC (permalink / raw)
To: Johannes Sixt; +Cc: git, Mike Ralphson
Thanks, both.
^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2008-12-03 2:19 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-12 9:29 hosting git on a nfs Thomas Koch
2008-11-12 10:10 ` Julian Phillips
2008-11-12 20:31 ` Brandon Casey
2008-11-12 17:36 ` David Brown
2008-11-12 18:14 ` Linus Torvalds
2008-11-13 18:18 ` J. Bruce Fields
2008-11-13 18:32 ` James Pickens
2008-11-13 20:18 ` Linus Torvalds
2008-11-13 21:05 ` Linus Torvalds
2008-11-13 23:23 ` James Pickens
2008-11-13 23:48 ` Linus Torvalds
2008-11-13 23:23 ` Julian Phillips
2008-11-13 23:42 ` Linus Torvalds
2008-11-14 0:04 ` Julian Phillips
2008-11-14 0:14 ` Brandon Casey
2008-11-14 0:38 ` Linus Torvalds
2008-11-14 0:59 ` Pieter de Bie
2008-11-14 1:15 ` Linus Torvalds
2008-11-14 3:33 ` James Pickens
2008-11-14 5:01 ` Linus Torvalds
2008-11-14 13:01 ` Michael J Gruber
2008-11-14 14:31 ` Kyle Moffett
2008-11-14 18:32 ` Brandon Casey
2008-11-14 19:23 ` Linus Torvalds
2008-11-14 20:14 ` Junio C Hamano
2008-11-14 23:10 ` Linus Torvalds
2008-11-15 12:08 ` [PATCH] Makefile: introduce NO_PTHREADS Junio C Hamano
2008-11-15 17:15 ` Linus Torvalds
2008-11-17 10:03 ` Mike Ralphson
2008-11-17 10:18 ` Junio C Hamano
2008-11-17 10:34 ` Johannes Sixt
2008-11-17 10:45 ` Mike Ralphson
2008-11-17 11:25 ` Johannes Sixt
2008-12-01 8:29 ` Johannes Sixt
2008-12-01 8:48 ` dhruva
2008-12-01 9:57 ` Mike Ralphson
2008-12-01 16:09 ` Mike Ralphson
2008-12-01 16:13 ` Mike Ralphson
2008-12-02 7:41 ` Johannes Sixt
2008-12-03 2:18 ` Junio C Hamano
2008-11-17 16:38 ` Junio C Hamano
2008-11-17 16:47 ` Linus Torvalds
2008-11-17 17:01 ` Fix index preloading for racy dirty case Linus Torvalds
2008-11-17 16:41 ` [PATCH] Makefile: introduce NO_PTHREADS Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).