Git development
 help / color / mirror / Atom feed
* Re: [PATCH] fast-export: deal with tag objects that do not have a tagger
From: Miklos Vajna @ 2008-12-18 23:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git, scott
In-Reply-To: <7viqphf4ua.fsf@gitster.siamese.dyndns.org>

[-- Attachment #1: Type: text/plain, Size: 771 bytes --]

On Thu, Dec 18, 2008 at 03:20:29PM -0800, Junio C Hamano <gitster@pobox.com> wrote:
> Such a "faking" can well be done, and should be done, on the consuming end
> of the information.  If you fake using the commit authorship, you would
> never be able to tell from the result which one is faked and which one is
> genuine.
> 
> I think you'd rather want to see "Unspecified Tagger" in the resulting tag
> object (or even better, a tag object without the tagger field created by
> the fast-import process), and leave the interpretation of missing tagger
> information to the consumers.

Aah, I missed that Dscho's patch will not just not die(), but it _will_
output a stream that fast-import will import. So just forget my
complain. :-)

And Dscho, thanks!

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH 1/2] Add support for multi threaded checkout
From: James Pickens @ 2008-12-18 23:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <alpine.LFD.2.00.0812181333150.14014@localhost.localdomain>

On Thu, Dec 18, 2008, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> So instead of doing all the unpacking etc in parallel (with locking around
> it to serialize it), I'd suggest doing ll the unpacking serially since
> that isn't the problem anyway (and since you have to protect it with a
> lock anyway), and just have a "write out and free the buffer" phase that
> is done in the threads.

That's certainly a more elegant way to do it, but unless I'm missing
something, it requires rewriting a good bit of code.  The main reason I
went with the locking was to keep the patch as simple and non-intrusive
as possible.

> The alternative would be to actually do what your patch suggests, but
> actually try to make the code git SHA1 object handling be thread-safe. At
> that point, the ugly locking in write_entry() would go away, and you might
> actually improve performance on SMP thanks to doing the CPU part in
> parallel.

I started down that path at one point, and quickly got in over my head.
Making all that code thread safe looks like a big task to me.  From my
perspective, I get a ~350% speedup from this easy patch, and I might get
an additional 25% (blind guess) from a much more difficult patch.  It
didn't seem worth the effort.

James

^ permalink raw reply

* Odd merge behaviour involving reverts
From: Alan @ 2008-12-18 23:25 UTC (permalink / raw)
  To: git

I am encountering an odd problem with how merging branches are working.
I expect I am doing something wrong, I just need to know what.

Here is my situation...

I have a master branch.  We have a branch off of that that some
developers are doing work on.  They claim it is ready. We merge it into
the master branch.  It breaks something so we revert the merge.  They
make changes to the code.  they get it to a point where they say it is
ok and we merge again.

When examined, we find that code changes made before the revert are not
in the master branch, but code changes after are in the master branch.

We now have code that does not contain all of the changes in the branch.

Is there a different way to undo merges into a branch that does not
block further merges from that branch?

What am i doing wrong here?

^ permalink raw reply

* Re: [PATCH] fast-export: deal with tag objects that do not have a tagger
From: Junio C Hamano @ 2008-12-18 23:20 UTC (permalink / raw)
  To: Miklos Vajna; +Cc: Johannes Schindelin, git, scott
In-Reply-To: <20081218213407.GX5691@genesis.frugalware.org>

Miklos Vajna <vmiklos@frugalware.org> writes:

> On Thu, Dec 18, 2008 at 08:45:44PM +0100, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>> 	I think so.  The responsible code is in fast-export.c, in any 
>> 	case.  Of course, fast-import refuses to import a tag without a 
>> 	tagger, so...
>
> That's why I asked. I think it's a reasonable assumption that in most
> cases the tagger and the committer of the tagged commit is the same. So
> in case the tagger info is missing and we tag a commit, we could fake
> that info on export.
>
> Obviously this should not be the default, but I think such a mode would
> be useful in real-life.

Such a "faking" can well be done, and should be done, on the consuming end
of the information.  If you fake using the commit authorship, you would
never be able to tell from the result which one is faked and which one is
genuine.

I think you'd rather want to see "Unspecified Tagger" in the resulting tag
object (or even better, a tag object without the tagger field created by
the fast-import process), and leave the interpretation of missing tagger
information to the consumers.

^ permalink raw reply

* Re: negated list in .gitignore no fun
From: Boyd Stephen Smith Jr. @ 2008-12-18 22:34 UTC (permalink / raw)
  To: jidanni; +Cc: git, joey
In-Reply-To: <87hc51tajw.fsf@jidanni.org>

[-- Attachment #1: Type: text/plain, Size: 1071 bytes --]

On Thursday 2008 December 18 15:53:23 jidanni@jidanni.org wrote:
> # head -n 5 .gitignore
> *
> !X11/xorg.conf
> !anacrontab
> !apt/apt.conf.d/10jidanni
> !apt/sources.list
> # git-add .
> But git-status only shows anacrontab got added. None of the files in
> the subdirectories get added.

That's by design.  You've chosen to ignore those directories; they match "*" 
themselves.  Thus, 'git add .' doesn't descend into them looking for files.

I wouldn't want to change this behavior because I'll often use a "build" 
directory and put all generated files in it and just ignore that directory 
with a single line in .gitignore.

I think the .gitignore you really want is:
*
!X11
!X11/xorg.conf
!anacrontab
!apt
!apt/apt.conf.d
!apt/apt.conf.d/10jidanni
!apt/sources.list
[...]

But, I haven't tested that.
-- 
Boyd Stephen Smith Jr.                     ,= ,-_-. =. 
bss@iguanasuicide.net                     ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy           `-'(. .)`-' 
http://iguanasuicide.net/                      \_/     

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: negated list in .gitignore no fun
From: Linus Torvalds @ 2008-12-18 22:54 UTC (permalink / raw)
  To: jidanni; +Cc: git, joey
In-Reply-To: <alpine.LFD.2.00.0812181429100.14014@localhost.localdomain>



On Thu, 18 Dec 2008, Linus Torvalds wrote:
> 
> So you have several possibilities:
> 
>  (a) either create a .gitignore that looks like this:
> 
> 	*
> 	!X11
> 	!X11/xorg.conf
> 	!anacrontab
> 	!apt
> 	!apt/apt.conf.d
> 	!apt/apt.conf.d/10jidanni
> 	!apt/sources.list
> 
>     which should work around it by telling git that it shouldn't ignore 
>     the subdirectories.

Oh, you should have a '/' there in the subdirectory marker too, because 
otherwise a file like 'X11/apt' would now match as a positive match.

>  (c) Try to teach git to not ignore subdirectories leading up to 
>      non-ignored files, and give you the .gitignore semantics you like. I 
>      suspect it's not worth it, because the git behaviour is logical once 
>      you know about it and understand it.

.. and because of subtle issues like this - negated entries are really 
quite complex enough as-is. Don't use them for anything subtle, you _will_ 
get them wrong regardless.

			Linus

^ permalink raw reply

* Re: negated list in .gitignore no fun
From: Linus Torvalds @ 2008-12-18 22:38 UTC (permalink / raw)
  To: jidanni; +Cc: git, joey
In-Reply-To: <87hc51tajw.fsf@jidanni.org>



On Fri, 19 Dec 2008, jidanni@jidanni.org wrote:
> 
> I had dreams of tracking only a few files in a large tree.
> I thought I would maintain that list as a negated list in .gitignore,
> and then always use "git-add ." to keep git's index reflecting my list.
> 
> However that's just not possible.
> 
> # head -n 5 .gitignore
> *
> !X11/xorg.conf
> !anacrontab
> !apt/apt.conf.d/10jidanni
> !apt/sources.list
> # git-add .
> But git-status only shows anacrontab got added. None of the files in
> the subdirectories get added.

Yeah, the problem is that the '*' matches the subdirectories (like "X11"), 
but the negative matching does not. So the subdirectory gets ignored, and 
as a result, git won't even traverse into it and notice that there's a 
file in there that shouldn't be ignored.

It's actually logical, but not what you want.

So you have several possibilities:

 (a) either create a .gitignore that looks like this:

	*
	!X11
	!X11/xorg.conf
	!anacrontab
	!apt
	!apt/apt.conf.d
	!apt/apt.conf.d/10jidanni
	!apt/sources.list

    which should work around it by telling git that it shouldn't ignore 
    the subdirectories.

 (b) realize that ".gitignore" only matters for files that git doesn't 
     already know about, so if you only want to track a small set of 
     files, what you _should_ do is just do a .gitignore that looks like 
     this:

	*

     and then just force-add the few files you want to track, using 
     something like

	git add -f X11/xorg.conf anacrontab apt/apt.conf.d/10jidanni apt/sources.list

     and now you're done - git won't ignore them, since explciitly you 
     told git to track them.

 (c) Try to teach git to not ignore subdirectories leading up to 
     non-ignored files, and give you the .gitignore semantics you like. I 
     suspect it's not worth it, because the git behaviour is logical once 
     you know about it and understand it.

.. and possibly other things you could do too.

			Linus

^ permalink raw reply

* negated list in .gitignore no fun
From: jidanni @ 2008-12-18 21:53 UTC (permalink / raw)
  To: git; +Cc: joey

I discovered git is so negative: it has very good .gitignore negative
matching facilities, but not as good positive matching facilities.
(Maybe positive glob lists are merely fed to git-add from the command line.)

I had dreams of tracking only a few files in a large tree.
I thought I would maintain that list as a negated list in .gitignore,
and then always use "git-add ." to keep git's index reflecting my list.

However that's just not possible.

# head -n 5 .gitignore
*
!X11/xorg.conf
!anacrontab
!apt/apt.conf.d/10jidanni
!apt/sources.list
# git-add .
But git-status only shows anacrontab got added. None of the files in
the subdirectories get added. We continue,
# sed -n s/^!//p .gitignore|xargs git-add #no help!
# sed -n s/^!//p .gitignore|xargs -n 1 git-add #Geez. Finally worked.
OK, I suppose my next step is just to rm .gitignore and just add any
future files I want to add to my list one by one with git-add... like
git was designed to do in the first place. OK, thanks. Bye.
Next episode: some kind of middle ground with etckeeper.

^ permalink raw reply

* git-rm -n leaves .git/index.lock if not allowed to finish
From: jidanni @ 2008-12-18 20:02 UTC (permalink / raw)
  To: git

Bug: if git-rm -n is not allowed to write all it wants to write, it
will leave a .git/index.lock file:
# git-rm -n -r . 2>&1|sed q
error: '.etckeeper' has changes staged in the index
# git-rm -n -r . 2>&1|sed q
fatal: unable to create '.git/index.lock': File exists

^ permalink raw reply

* Re: Git with Hudson
From: Stephen Haberman @ 2008-12-18 22:07 UTC (permalink / raw)
  To: Indy Nagpal; +Cc: git
In-Reply-To: <D2F0F023-862A-4BAB-88B9-BFEFC5592D10@strakersoftware.com>


> However, before we do that I wanted to check if anyone has had any
> experience/feedback in integrating Git with Hudson CI server?

We tried using the Hudson git plugin that you can download from the
Hudson site and ended up with problems--whether we had too many branches
or something, the plugin has some funny "figure out what needs to be
built" logic that issued near-constant git rev-list commands. To the
point where our own "git fetch" calls would get starved for 20-30
seconds.

We eventually wrote our own Hudson git plugin that is simpler and
doesn't do any funny rev-listing/walking. It just stores last hash
built and rebuilds once that doesn't match the branch tip. Once that
was in place, it worked great.

I've got permission to publish it if you're interested--just haven't
yet.

- Stephen

^ permalink raw reply

* Re: [RFC PATCH 0/2] Add support for multi threaded checkout
From: James Pickens @ 2008-12-18 21:42 UTC (permalink / raw)
  To: devel; +Cc: git@vger.kernel.org
In-Reply-To: <494ABDC9.9060001@morey-chaisemartin.com>

On Thu, Dec 18, 2008 at 2:16 PM, Nicolas Morey-Chaisemartin
<devel@morey-chaisemartin.com> wrote:
> I guess you could do something like :
>
> #define checkout_lock()         core_threaded_checkout ?pthread_mutex_lock(&checkout_mutex) : (void) 0
> #define checkout_unlock()               core_threaded_checkout ?pthread_mutex_unlock(&checkout_mutex) : (void) 0
>
> It should be faster when you don't actually use threaded checkouts, as you won't unnecessarily lock/unlock your mutex.
>
> Have you looked at the perf from local to local? I'm just curious.


I had looked at it before but didn't record any numbers.  I just took the
following timings (2 runs each):

master             13.78    12.79
threads enabled    16.84    20.45
threads disabled   14.07    13.27

James

^ permalink raw reply

* Re: [PATCH 1/2] Add support for multi threaded checkout
From: Linus Torvalds @ 2008-12-18 21:41 UTC (permalink / raw)
  To: James Pickens; +Cc: git
In-Reply-To: <1229633811-3877-1-git-send-email-james.e.pickens@intel.com>



On Thu, 18 Dec 2008, James Pickens wrote:
>
> This speeds up operations like 'git clone' on NFS drives tremendously, but
> slows down the same operations on local disks.
> 
> Partitioning the work and launching threads is done in unpack-trees.c.  The code
> is mostly copied from preload_index.c.  The maximum number of threads is set to
> 8, which seemed to give a reasonable tradeoff between performance improvement on
> NFS and degradation on local disks.

Hmm. I don't really like this very much.

Why? Because as your locking shows, we can really only parallelise the 
actual write-out anyway, and rather than do any locking there, wouldn't it 
be better to have a notion of "queued work" (which would be just the 
write-out) to be done in parallel?

So instead of doing all the unpacking etc in parallel (with locking around 
it to serialize it), I'd suggest doing ll the unpacking serially since 
that isn't the problem anyway (and since you have to protect it with a 
lock anyway), and just have a "write out and free the buffer" phase that 
is done in the threads.

The alternative would be to actually do what your patch suggests, but 
actually try to make the code git SHA1 object handling be thread-safe. At 
that point, the ugly locking in write_entry() would go away, and you might 
actually improve performance on SMP thanks to doing the CPU part in 
parallel.

But as-is, I think the patch is a bit ugly. The reason I liked the index 
pre-reading was that it could be done entirely locklessly, so it really 
did parallelize it _fully_ (even if the IO latency part was the much 
bigger issue), and that was also why it actually ended up helping even on 
a local disk (only if you have multiple cores, of course).

		Linus

^ permalink raw reply

* Re: [PATCH] Add git-edit-index.perl
From: Junio C Hamano @ 2008-12-18 21:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, Neil Roberts, git
In-Reply-To: <20081218140411.GB6706@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Thu, Dec 18, 2008 at 02:48:39PM +0100, Johannes Schindelin wrote:
>
>> Yes, it is a neat idea.  But I always keep in mind what Junio had to say 
>> about my "add -e" thing (that I use pretty frequently myself): you will 
>> put something into the index that has _never_ been tested.
>> 
>> Would we really want to bless such a workflow with "official" support?

Back in stone ages of git, there wasn't usable tool support to make random
unproven commits, later to be tested separately before releasing.  The old
aversion to committing something that has never existed as a whole in the
work tree comes from those days.

The world has changed quite a bit since then, and I do not think the
argument holds anymore when better tool support for "commit first,
validate and fix-up as needed later" workflow is available.

> That is definitely something to be concerned about. Which is why my
> workflow is something like:
>
>   $ hack hack hack
>   $ while ! git diff; do
>       git add -p
>       git commit
>     done
>   $ for i in `git rev-list origin..`; do
>       git checkout $i && make test || barf
>     done
>
> That is, it is not inherently a problem to put something untested into
> the index as long as you are doing it so that you can go back and test
> later.

Yeah, I do not think there is anything inherently wrong about it, either.

^ permalink raw reply

* Re: git-fast-export and tags without a tagger
From: Miklos Vajna @ 2008-12-18 21:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git, scott
In-Reply-To: <7vbpv9guqd.fsf@gitster.siamese.dyndns.org>

[-- Attachment #1: Type: text/plain, Size: 507 bytes --]

On Thu, Dec 18, 2008 at 11:15:54AM -0800, Junio C Hamano <gitster@pobox.com> wrote:
> Is "fast-export" the only thing that chokes on these tags?
> 
> What I am worried about is if we have accidentally/stupidly/by mistake
> made other codepaths that check the validity of a tag object too strict,
> which would result things like "git show $such_a_tag" to barf.

git show works fine on the tag. git fsck --full passes without any
warning/error as well, so I don't think we have to worry about that.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH] fast-export: deal with tag objects that do not have a tagger
From: Miklos Vajna @ 2008-12-18 21:34 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git, scott
In-Reply-To: <alpine.DEB.1.00.0812182044100.6952@intel-tinevez-2-302>

[-- Attachment #1: Type: text/plain, Size: 592 bytes --]

On Thu, Dec 18, 2008 at 08:45:44PM +0100, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> 	I think so.  The responsible code is in fast-export.c, in any 
> 	case.  Of course, fast-import refuses to import a tag without a 
> 	tagger, so...

That's why I asked. I think it's a reasonable assumption that in most
cases the tagger and the committer of the tagged commit is the same. So
in case the tagger info is missing and we tag a commit, we could fake
that info on export.

Obviously this should not be the default, but I think such a mode would
be useful in real-life.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [RFC PATCH 0/2] Add support for multi threaded checkout
From: Nicolas Morey-Chaisemartin @ 2008-12-18 21:16 UTC (permalink / raw)
  To: Pickens, James E; +Cc: git@vger.kernel.org
In-Reply-To: <3BA20DF9B35F384F8B7395B001EC3FB3265B2A01@azsmsx507.amr.corp.intel.com>

Pickens, James E a écrit :
> Even in that case it will have *some* impact
> from locking/unlocking the mutex, but I think it would be in the noise.
>
> James
> -
I guess you could do something like :

#define checkout_lock()		core_threaded_checkout ?pthread_mutex_lock(&checkout_mutex) : (void) 0
#define checkout_unlock()		core_threaded_checkout ?pthread_mutex_unlock(&checkout_mutex) : (void) 0

It should be faster when you don't actually use threaded checkouts, as you won't unnecessarily lock/unlock your mutex. 

Have you looked at the perf from local to local? I'm just curious.

Nicolas Morey-Chaisemartin

^ permalink raw reply

* Re: [RFC PATCH 0/2] Add support for multi threaded checkout
From: James Pickens @ 2008-12-18 21:13 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git@vger.kernel.org
In-Reply-To: <alpine.LFD.2.00.0812181600210.30035@xanadu.home>

[Resending since I forgot to copy the list]

On Thu, Dec 18, 2008, Nicolas Pitre <nico@cam.org> wrote:
> On Thu, 18 Dec 2008, Pickens, James E wrote:
>
>>                    NFS->NFS    NFS->Local
>> master (53682f0c)    2:46.1          13.3
>> with threads           36.6          18.2
>>
>> So it improved performance on NFS significantly.
>
> Are those figures repeatable over multiple consecutive runs?

Roughly, yes.  There is some variance of course, probably
more than usual since all these operations involve NFS.
The numbers are the best of several runs in each case.

James

^ permalink raw reply

* Re: [RFC PATCH 0/2] Add support for multi threaded checkout
From: Nicolas Pitre @ 2008-12-18 21:02 UTC (permalink / raw)
  To: Pickens, James E; +Cc: git@vger.kernel.org
In-Reply-To: <3BA20DF9B35F384F8B7395B001EC3FB3265B2A01@azsmsx507.amr.corp.intel.com>

On Thu, 18 Dec 2008, Pickens, James E wrote:

>                    NFS->NFS    NFS->Local
> master (53682f0c)    2:46.1          13.3
> with threads           36.6          18.2
> 
> So it improved performance on NFS significantly.

Are those figures repeatable over multiple consecutive runs?


Nicolas

^ permalink raw reply

* [PATCH 1/2] Add support for multi threaded checkout
From: James Pickens @ 2008-12-18 20:56 UTC (permalink / raw)
  To: git; +Cc: James Pickens
In-Reply-To: <3BA20DF9B35F384F8B7395B001EC3FB3265B2A01@azsmsx507.amr.corp.intel.com>

This speeds up operations like 'git clone' on NFS drives tremendously, but
slows down the same operations on local disks.

Partitioning the work and launching threads is done in unpack-trees.c.  The code
is mostly copied from preload_index.c.  The maximum number of threads is set to
8, which seemed to give a reasonable tradeoff between performance improvement on
NFS and degradation on local disks.

Some code was added to entry.c for serialization.  Most of the contents of
checkout_entry and write_entry are serialized, except writing the checked out
files to disk.
---
 entry.c        |   42 +++++++++++++++++---
 unpack-trees.c |  115 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 151 insertions(+), 6 deletions(-)

diff --git a/entry.c b/entry.c
index aa2ee46..764d2db 100644
--- a/entry.c
+++ b/entry.c
@@ -1,6 +1,21 @@
 #include "cache.h"
 #include "blob.h"
 
+#ifdef NO_PTHREADS
+
+#define checkout_lock()		(void)0
+#define checkout_unlock()	(void)0
+
+#else
+
+#include <pthread.h>
+
+static pthread_mutex_t checkout_mutex = PTHREAD_MUTEX_INITIALIZER;
+#define checkout_lock()		pthread_mutex_lock(&checkout_mutex)
+#define checkout_unlock()	pthread_mutex_unlock(&checkout_mutex)
+
+#endif
+
 static void create_directories(const char *path, const struct checkout *state)
 {
 	int len = strlen(path);
@@ -100,7 +115,7 @@ static void *read_blob_entry(struct cache_entry *ce, const char *path, unsigned
 
 static int write_entry(struct cache_entry *ce, char *path, const struct checkout *state, int to_tempfile)
 {
-	int fd;
+	int fd, retval;
 	long wrote;
 
 	switch (ce->ce_mode & S_IFMT) {
@@ -109,10 +124,15 @@ static int write_entry(struct cache_entry *ce, char *path, const struct checkout
 		unsigned long size;
 
 	case S_IFREG:
+		checkout_lock();
 		new = read_blob_entry(ce, path, &size);
-		if (!new)
-			return error("git checkout-index: unable to read sha1 file of %s (%s)",
+
+		if (!new) {
+			retval = error("git checkout-index: unable to read sha1 file of %s (%s)",
 				path, sha1_to_hex(ce->sha1));
+			checkout_unlock();
+			return retval;
+		}
 
 		/*
 		 * Convert from git internal format to working tree format
@@ -124,6 +144,7 @@ static int write_entry(struct cache_entry *ce, char *path, const struct checkout
 			new = strbuf_detach(&buf, &newsize);
 			size = newsize;
 		}
+		checkout_unlock();
 
 		if (to_tempfile) {
 			strcpy(path, ".merge_file_XXXXXX");
@@ -143,10 +164,17 @@ static int write_entry(struct cache_entry *ce, char *path, const struct checkout
 			return error("git checkout-index: unable to write file %s", path);
 		break;
 	case S_IFLNK:
+		checkout_lock();
 		new = read_blob_entry(ce, path, &size);
-		if (!new)
-			return error("git checkout-index: unable to read sha1 file of %s (%s)",
+
+		if (!new) {
+			retval = error("git checkout-index: unable to read sha1 file of %s (%s)",
 				path, sha1_to_hex(ce->sha1));
+			checkout_unlock();
+			return retval;
+		}
+		checkout_unlock();
+
 		if (to_tempfile || !has_symlinks) {
 			if (to_tempfile) {
 				strcpy(path, ".merge_link_XXXXXX");
@@ -192,7 +220,7 @@ static int write_entry(struct cache_entry *ce, char *path, const struct checkout
 
 int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath)
 {
-	static char path[PATH_MAX + 1];
+	char path[PATH_MAX + 1];
 	struct stat st;
 	int len = state->base_dir_len;
 
@@ -229,6 +257,8 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
 			return error("unable to unlink old '%s' (%s)", path, strerror(errno));
 	} else if (state->not_new)
 		return 0;
+	checkout_lock();
 	create_directories(path, state);
+	checkout_unlock();
 	return write_entry(ce, path, state, 0);
 }
diff --git a/unpack-trees.c b/unpack-trees.c
index 54f301d..30b9862 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -8,6 +8,10 @@
 #include "progress.h"
 #include "refs.h"
 
+#ifndef NO_PTHREADS
+#include <pthread.h>
+#endif
+
 /*
  * Error messages expected by scripts out of plumbing commands such as
  * read-tree.  Non-scripted Porcelain is not required to use these messages
@@ -85,6 +89,115 @@ static void unlink_entry(struct cache_entry *ce)
 }
 
 static struct checkout state;
+
+#ifdef NO_PTHREADS
+#define progress_lock()		(void)0
+#define progress_unlock()	(void)0
+
+static int threaded_checkout(struct index_state *index, int update, struct progress *prog, unsigned *prog_cnt)
+{
+	return 0; /* do nothing */
+}
+
+#else
+
+#include <pthread.h>
+
+static pthread_mutex_t progress_mutex = PTHREAD_MUTEX_INITIALIZER;
+#define progress_lock()		pthread_mutex_lock(&progress_mutex)
+#define progress_unlock()	pthread_mutex_unlock(&progress_mutex)
+
+/*
+ * Mostly randomly chosen maximum thread counts: we
+ * cap the parallelism to 8 threads, and we want
+ * to have at least 500 files per thread for it to
+ * be worth starting a thread.
+ */
+#define MAX_PARALLEL (8)
+#define THREAD_COST (500)
+
+struct thread_data {
+	pthread_t pthread;
+	struct index_state *index;
+	struct checkout *state;
+	int update, offset, nr, errs;
+	struct progress *progress;
+	unsigned *progress_cnt;
+};
+
+static void *checkout_thread(void *_data)
+{
+	int nr;
+	struct thread_data *p = _data;
+	struct index_state *index = p->index;
+	struct cache_entry **cep = index->cache + p->offset;
+
+	p->errs = 0;
+
+	nr = p->nr;
+	if (0 == nr) {
+		return NULL;
+	}
+
+	if (nr + p->offset > index->cache_nr)
+		nr = index->cache_nr - p->offset;
+
+	do {
+		struct cache_entry *ce = *cep++;
+
+		if (ce->ce_flags & CE_UPDATE) {
+			progress_lock();
+			display_progress(p->progress, ++(*p->progress_cnt));
+			progress_unlock();
+			ce->ce_flags &= ~CE_UPDATE;
+			if (p->update) {
+				p->errs |= checkout_entry(ce, p->state, NULL);
+				fflush(stdout);
+			}
+		}
+	} while (--nr > 0);
+	return NULL;
+}
+
+static int threaded_checkout(struct index_state *index, int update, struct progress *prog, unsigned *prog_cnt)
+{
+	int threads, work, offset, i;
+	struct thread_data data[MAX_PARALLEL];
+	int errs = 0;
+
+	threads = index->cache_nr / THREAD_COST;
+	if (threads > MAX_PARALLEL)
+		threads = MAX_PARALLEL;
+	else if (threads == 0)
+		return 0;
+
+	offset = 0;
+	work = (index->cache_nr + threads - 1) / threads;
+	for (i = 0; i < threads; i++) {
+		struct thread_data *p = data+i;
+		p->index = index;
+		p->offset = offset;
+		p->nr = work;
+		p->state = &state;
+		p->update = update;
+		p->progress = prog;
+		p->progress_cnt = prog_cnt;
+		offset += work;
+		if (pthread_create(&p->pthread, NULL, checkout_thread, p))
+			die("unable to create threaded checkout");
+	}
+	for (i = 0; i < threads; i++) {
+		struct thread_data *p = data+i;
+		if (pthread_join(p->pthread, NULL))
+			die("unable to join threaded checkout");
+		errs |= p->errs;
+	}
+
+	return errs;
+}
+
+#endif
+
 static int check_updates(struct unpack_trees_options *o)
 {
 	unsigned cnt = 0, total = 0;
@@ -118,6 +231,8 @@ static int check_updates(struct unpack_trees_options *o)
 		}
 	}
 
+	errs |= threaded_checkout(index, o->update, progress, &cnt);
+
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
-- 
1.6.0.4.1116.gc5d7

^ permalink raw reply related

* [PATCH 2/2] Add core.threadedcheckout config option
From: James Pickens @ 2008-12-18 20:56 UTC (permalink / raw)
  To: git; +Cc: James Pickens
In-Reply-To: <1229633811-3877-1-git-send-email-james.e.pickens@intel.com>

Setting it to 'true' enables multi threaded checkout.  Default is false.
---
 Documentation/config.txt |    8 ++++++++
 cache.h                  |    1 +
 config.c                 |    5 +++++
 environment.c            |    3 +++
 unpack-trees.c           |    3 +++
 5 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 21ea165..22ac76b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -422,6 +422,14 @@ relatively high IO latencies.  With this set to 'true', git will do the
 index comparison to the filesystem data in parallel, allowing
 overlapping IO's.
 
+core.threadedcheckout::
+	Enable parallel checkout for operations like 'git checkout'
++
+This can speed up operations like 'git clone' and 'git checkout' especially
+on filesystems like NFS that have relatively high IO latencies.  With this
+set to 'true', git will write the checked out files to disk in parallel,
+allowing overlapping IO's.
+
 alias.*::
 	Command aliases for the linkgit:git[1] command wrapper - e.g.
 	after defining "alias.last = cat-file commit HEAD", the invocation
diff --git a/cache.h b/cache.h
index 231c06d..0777597 100644
--- a/cache.h
+++ b/cache.h
@@ -512,6 +512,7 @@ extern size_t delta_base_cache_limit;
 extern int auto_crlf;
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_threaded_checkout;
 
 enum safe_crlf {
 	SAFE_CRLF_FALSE = 0,
diff --git a/config.c b/config.c
index 790405a..819693e 100644
--- a/config.c
+++ b/config.c
@@ -495,6 +495,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.threadedcheckout")) {
+		core_threaded_checkout = git_config_bool(var, value);
+		return 0;
+	}
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
 }
diff --git a/environment.c b/environment.c
index e278bce..2450b65 100644
--- a/environment.c
+++ b/environment.c
@@ -46,6 +46,9 @@ enum rebase_setup_type autorebase = AUTOREBASE_NEVER;
 /* Parallel index stat data preload? */
 int core_preload_index = 0;
 
+/* Parallel checkout? */
+int core_threaded_checkout = 0;
+
 /* This is set by setup_git_dir_gently() and/or git_default_config() */
 char *git_work_tree_cfg;
 static char *work_tree;
diff --git a/unpack-trees.c b/unpack-trees.c
index 30b9862..635b7dc 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -165,6 +165,9 @@ static int threaded_checkout(struct index_state *index, int update, struct progr
 	struct thread_data data[MAX_PARALLEL];
 	int errs = 0;
 
+	if (!core_threaded_checkout)
+		return 0;
+
 	threads = index->cache_nr / THREAD_COST;
 	if (threads > MAX_PARALLEL)
 		threads = MAX_PARALLEL;
-- 
1.6.0.4.1116.gc5d7

^ permalink raw reply related

* [RFC PATCH 0/2] Add support for multi threaded checkout
From: Pickens, James E @ 2008-12-18 20:51 UTC (permalink / raw)
  To: git@vger.kernel.org


There was some discussion a while back about improving git performance on
NFS (http://article.gmane.org/gmane.comp.version-control.git/100950).
This led to Linus adding the 'preload_index' function, which improves
performance of several commands using multi threading.  He also briefly
described how to do the same for 'git checkout'.  Well, I finally found
some time to work on it, and will post patches shortly.  This is my first
patch; apologies if I screwed something up.

Patch 1 adds the functionality, and 2 adds a config option to
enable/disable it.

Much of the patch is literally copy/paste from preload-index.c into
unpack-trees.c.  Many of the functions called during checkout are not
thread safe, so I added a mutex in entry.c to serialize basically
everything except writing the files to disk.  I also added a mutex in
unpack-trees.c for the progress meter.

It passes the test suite, and seems fairly safe to my naïve eyes.

Here are some benchmarks, cloning a linux kernel repo I had on an NFS
drive:

                   NFS->NFS    NFS->Local
master (53682f0c)    2:46.1          13.3
with threads           36.6          18.2

So it improved performance on NFS significantly.  Unfortunately it also
degraded performance on the local disk significantly.  I'm hoping someone
will suggest a way to mitigate that... I think it would be reasonable to
disable the threading except when the work dir is on NFS, but I don't
know how to detect that.  Even in that case it will have *some* impact
from locking/unlocking the mutex, but I think it would be in the noise.

James

^ permalink raw reply

* [topgit] shared topic branch
From: Fabien Thomas @ 2008-12-18 20:02 UTC (permalink / raw)
  To: git

Hi,

I'm testing topgit 0.5 in a shared env. (multiple user working on same  
patch).

After following the tutorial i end up with something like this:

$ git branch
   master
   topic/test1
* topic/test2
$ git branch -r
   origin/HEAD
   origin/master
   origin/svn_stable_7
   origin/svn_trunk
   origin/top-bases/topic/test1
   origin/top-bases/topic/test2
   origin/topic/test1
   origin/topic/test2

My problem is that when i want to push my local work i'm doing "git  
push" that will force update the remote branch.
The problem is that each time master is not up to date i will push my  
entire master or topic branch to the remote.

After looking at git config file it seems that this is something  
normal generated by "tg remote --populate origin":

[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
[remote "origin"]
	url = ssh://git@xxx.com
	fetch = +refs/heads/*:refs/remotes/origin/*
	fetch = +refs/top-bases/*:refs/remotes/origin/top-bases/*
	push = +refs/top-bases/*:refs/top-bases/*
	push = +refs/heads/*:refs/heads/*
[branch "master"]
	remote = origin
	merge = refs/heads/master
[merge "ours"]
	name = \"always keep ours\" merge driver
	driver = touch %A
[topgit]
	remote = origin

What i'm doing wrong ?

Regards,
Fabien

^ permalink raw reply

* Re: [PATCHv5 2/3] gitweb: add patches view
From: Giuseppe Bilotta @ 2008-12-18 19:57 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Junio C Hamano, git, Petr Baudis
In-Reply-To: <200812181915.48556.jnareb@gmail.com>

On Thu, Dec 18, 2008 at 7:15 PM, Jakub Narebski <jnareb@gmail.com> wrote:
>
> Just a though, but does it really take much sense to have "patch" and
> "patches" views handled in git_commitdiff? I can understand in the
> first version, where it was more about better 'commitdiff_plain', but
> I wonder about now, where patch(es) view is something between git show
> and log_plain format... Wouldn't it be simpler to use separate
> subroutine, for example git_format_patch or something?

I don't feel like much has changed since the beginning when IYO it
made sense to have it part of commitdiff, honestly

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply

* Re: is it possible filter the revision history of a single file into another repository?
From: Whit Armstrong @ 2008-12-18 19:51 UTC (permalink / raw)
  To: Thomas Jarosch; +Cc: git
In-Reply-To: <8ec76080812180619k78a28e30t591b514148202869@mail.gmail.com>

Sorry, seem to be getting this error:
`/home/whit/dvl/risk.metrics.utils/RiskMetrics/.git-rewrite/t/../index.new':
No such file or directory

do I need to set up the index file first?

Is there a good site that documents this procedure?

[whit@linuxsvr RiskMetrics]$ git filter-branch --tag-name-filter cat
--index-filter \
>    'git ls-files -s |grep -P "riskmetrics.rb" \
>    |GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info &&
>    mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE' -- --all
Rewrite 8f1a0eaae033d109f4a3a4b410bd8e04dd9997db (1/481)mv: cannot
stat `/home/whit/dvl/risk.metrics.utils/RiskMetrics/.git-rewrite/t/../index.new':
No such file or directory
index filter failed: git ls-files -s |grep -P "riskmetrics.rb" \
   |GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info &&
   mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE
[whit@linuxsvr RiskMetrics]$



On Thu, Dec 18, 2008 at 9:19 AM, Whit Armstrong
<armstrong.whit@gmail.com> wrote:
> thanks, I will give this a try.
>
> On Thu, Dec 18, 2008 at 9:04 AM, Thomas Jarosch
> <thomas.jarosch@intra2net.com> wrote:
>> On Thursday, 18. December 2008 14:51:12 Whit Armstrong wrote:
>>> For instance, if my repository contains foo.c, and 100 other files.
>>>
>>> I would like to create a new and separate repository containing only
>>> the revision history of foo.c.
>>>
>>> Would someone mind pointing me at some documentation for this
>>> procedure if it exists?
>>
>> This worked for me:
>>
>> git filter-branch --tag-name-filter cat --index-filter \
>>    'git ls-files -s |grep -P "\t(DIR1|DIR2)" \
>>    |GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info &&
>>    mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE' -- --all
>>
>> Run "git ls-files -s" to see the output format.
>> Replace the "DIR1|DIR2" with "foo.c".
>>
>> Later on you might want to remove empty commits from the history:
>> git filter-branch --tag-name-filter cat --commit-filter 'if [ z$1 = z`git rev-parse $3^{tree}` ]; then skip_commit "$@"; else git commit-tree "$@"; fi' "$@" -- --all
>>
>> If you want to run two filter-branch commands in a row
>> or you want to free up the space in .git afterwards:
>>
>> - git for-each-ref --format='%(refname)' refs/original | xargs -i git update-ref -d {}
>> - git reflog expire --expire=0 --all
>> - git repack -a -d
>> - git prune
>>
>> Cheers,
>> Thomas
>>
>>
>

^ permalink raw reply

* Re: [PATCH] Add git-edit-index.perl
From: Johannes Schindelin @ 2008-12-18 19:47 UTC (permalink / raw)
  To: Miklos Vajna; +Cc: Jeff King, Neil Roberts, git
In-Reply-To: <20081218163654.GR5691@genesis.frugalware.org>

Hi,

On Thu, 18 Dec 2008, Miklos Vajna wrote:

> On Thu, Dec 18, 2008 at 05:24:00PM +0100, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> 
> > On Thu, 18 Dec 2008, Jeff King wrote:
> > 
> > > It _would_ be a nicer workflow to say "I don't want these changes 
> > > yet" and selectively put them elsewhere, test what's in the working 
> > > tree, commit, and then grab some more changes from your stash. But 
> > > we don't have interactive stashing and unstashing yet, which would 
> > > be required for that.
> > 
> > git stash -i... Yes, I'd like that!
> 
> Or git checkout -i?

Frankly, I need to move changes away much more often.  Plus, you could 
have what you wished for with a "git checkout -- <path> && git stash -i".  
It's just that you would move out the changes you would not want yet.

Ciao,
Dscho

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox